Frontier Rebooted Part 5: Kneel Before LOD

By Shamus Posted Thursday Jun 5, 2014

 Things that drive me nuts about OpenGLPrevious Post

Next PostFrontier Rebooted Part 6: Worst-case Scenario 

The most central problem to rendering any wide open space is, “How do we avoid drawing everything in the entire world?” You can see this problem at work in open world games like Grand Theft Auto. You’re navigating around a massive city. There are literally tens of thousands (maybe even hundreds of thousands) of objects around the city: Street lights, dumpsters, trash cans, newspaper stands, benches, trees, mailboxes, awnings, telephone poles, parking meters, street signs, traffic cones, trash bags, chain-link fences, and jersey barriers.

And crates. Can’t forget the crates.

I don’t care how much horsepower you have, how much memory you’ve got, or how many surfboard-sized graphics cards you glue together and jam inside your PC: Taking all of that clutter for the entire city and hurling it at the graphics hardware would be ruinous.

So the game needs some way of controlling what things get drawn and how detailed they are. The trash cans two miles away? They don’t even need to be in memory. The street lamp four blocks away? That gets drawn, but we’re going to draw a crude simplified version of it, probably a simple vertical beam with four sides. At this distance it doesn’t matter how crappy it looks, we just need a little black pole to stand in for the real thing. But this mailbox right beside the camera? That needs to be rendered in full detail.

The process of sorting this out is called Level Of Detail. It’s a complicated and interesting branch of knowledge. The trick is that the optimal LOD solution will vary a great deal based on your project. The system used by Grand Theft Auto IV is going to be very different from the one in Spore, which is again different from the one in World of Warcraft. Or Far Cry. Or Minecraft.

The thing is, in the vast majority of cases LOD is something that gets sorted out by the CPU. In my very first programming project here on the site, I spent the entire time sorting through polygons and topography, figuring out what parts were worth drawing and what parts could be simplified. But we live in a strange world now.

You know what we need? A Terrible Car Analogy:

We’ve got a pizza place at the end of a mile-long driveway. Our CPU is the cook inside. He’s handling the phone, taking orders, counting money, making pies. He’s got a fleet of, say, a dozen drivers who each have their own personal jet-powered Batmobile. This fleet is the graphics cardOnce of the hallmarks of a really good Terrible Car Analogy is a really complicated metaphor.. The cook can just barely cook pies fast enough to keep all the drivers busy making deliveries. But for some reason the cook gets it into his head that he’s going to “save some time” for his drivers. He stops cooking and walks a stack of pies out to the end of the driveway so the drivers won’t have to drive as far to pick up their pies.

Maybe that was more a pizza analogy than a car analogy, but you get the idea.

In any case, this is what we’re doing when we spend CPU cycles trying to lighten the load of the graphics card.

Mostly. Usually. It depends.

It depends on how much crap you’re trying to draw. If you push enough staticIn this context, “static” is just programmer talk for “doesn’t change”. polygons and render enough pixels, you will eventually reach the limit of what the GPU and candle. And if the CPU is spending a lot of time idleMaybe this is a graphics demo with no physics, sound, AI, caching, or networking going on. then you might as well have it do something to lighten the GPU load.

Basically, we’ve got a two-part asymmetrical job being performed by a two-part asymmetrical pair of workers. On the PC, this makes for some serious coding challenges. Maybe this game is being run in the future (from the programmer’s perspective) and the user has some incredible CPU. But maybe they’re using a crappy integrated GPU that’s far behind the times. Maybe the user is running a very old computer with a slow CPU, but they’ve put a super-deluxe graphics card into it in the hopes of speeding things up. This wild and unpredictable load balance is why PC gamers want lots of graphics options to tweak. Something like FSAA is basically free for the CPU, but devours a lot of GPU power. At the same time, turning up the detail draw distance or model complexity will likely hit the CPU way harder than the GPU.

For the purposes of this project, we’re all about shoving as much stuff onto the GPU as we can. So many of my projects – being both low-tech and procedural – have been CPU heavy and GPU trivial. The goal here is to push the GPU to get a better feel for this load-balance stuff.

So our first job is to abandon the idea of wasting CPU cycles to save GPU cycles. The GPU can deal with it.

But that’s not to say we want to be completely irresponsible with our GPU. I mean, we don’t want to throw the whole 2048×2048 world at it every frame. That’s 2048² triangle pairs. Your graphics card can probably handle 8,388,608 triangles okay, but we can cut that number way, way down with some basic tricks that won’t cost us any CPU.

So what we do is this:

We build a mesh that’s dense in polygons in the center, but steps down in resolution away from the center. It’s a completely flat grid. It doesn’t need to be the size of the whole dang world. a 1km draw distance is pretty cool, but if the world itself is only 2km² it’s kind of strange. It means the instant something vanishes over the western horizon it appears to the east. So let’s make our mesh just 1km on a side.

We can pack this perfectly static, unchanging grid into a vertex buffer and forget all about it. We never need to touch it once the program is running. I know the rainbow coloring makes it kind of hard to see what you’re looking atI do this when designing the mesh. The rainbow makes it easy to see if triangles are improperly arranged.. Here is a drawing to give the general idea of how it’s built:

Hand-drawn. I forgot to cut a few rectangles into triangles, and technically the rectangles should all be cut going the same direction. But this is close enough. You get the idea.

We make a vertex shader that will take this mesh as input. For each vertex:

Look at where the camera is. Round its horizontal position off to the nearest grid pointEvery 16th grid point, actually. Otherwise the terrain feels “jittery” by changing very slightly as you move around..
Add this rounded value to the position of the vertex. This will effectively shift the grid to always be directly under the player, with the highest detail right under them.
Use this new value to figure out what part of the world this vertex is in. Then look in the heightmap, and lift the vertex up.

So instead of messing around with 8 million triangles, we’ve got just…

78,144. We can live with that. This is all done by the GPU. As far as the CPU is concerned we’re just rendering the exact same flat plane over and over, but once it goes through the shader it becomes this complex, seamless, infinitely tiling topography. And since we’ve got the camera position handy, it’s easy to make the land fall away in the distance, thus creating a faux-spherical world.

Are we rendering more triangles than we need to? Sure. A proper optimization could save us even more. But that would mean the mesh would need to be updated when the camera moved around, which would be the equivalent of our pizza cook hiking a mile to cut down on how far the Batmobiles have to drive. And yes the analogy is strange.

We need one more thing here. We need to color the landscape. Stuff touching the water should look like beach. Steep surfaces should be bare dirt instead of grass. Very steep should be rock instead of dirt. Since the landscape changes over time, we can’t just work this out at load time. We need to keep updating it. But the data doesn’t change quickly. This isn’t something that needs to be updated every single frame.

So what we do is create a background thread. It’s really low-key. It begins by downloading a copy of the heightmap. (Remember that the heightmap lives on the GPU, where it’s formed by the erosion shaders.) Then it passes over the landscape a little bit at a time, looking at height values and doing a lot of if/then/else if/else if/then/else fallback type logic to figure out if a given cell should be grass, dirt, rock, snow, beach. It comes up with a color value for each point, with slight variations so the world doesn’t get too monotonous.

When it’s done, it uploads the color values into a texture that we’ll be using on the terrain from now on. I’m calling this the color map, and it’s the same size as the heightmap. The two go together, with one defining the elevations and the other defining the color. Once the color map is uploaded, the thread downloads a fresh updated version of the heightmap and starts over. The whole cycle takes a few seconds.

Yes, technically we could probably offload this job to the GPU. But the GPU is really inefficient at branching logic, so a lot of power would go to waste. More importantly, it would be a pain in the ass to set this up. We need to keep a lot of variables around for comparing cells. Is one of my neighbors water? Is one of them rock? Am I rock? Do I have any neighbors that are dirt that aren’t touching sand? Gah. Doing that kind of logic with a shader would be horrendous. We’d need to make another shader to hold all of these values to keep track of all these little attributes of individual cells. We’d need yet another entire shader pass just to fill in the variables so we could then do another pass for filling in the color map.

And yes, maybe we would have to suck it up and do that if this was a time-critical job. But it’s not. The color map can go for several seconds without being updated without it causing any problems. There’s no reason to kill ourselves building complex interfaces to optimize low-priority tasks.

Moreover, the point of this project is to get some experience building a variety of typical shader systems from scratch. Using a shader to do something that outlandish doesn’t really advance that goal.

Anyway. It works. I think we’re nearing the end of my list of planned features and experiments. I want to put some foliage on the terrain and muck about with shaders a bit more. Then we need to see how performance is. I’m still thinking about Oculus Rift support. I’m not planning on adding VR to this little project, but for educational purposes I want to see how difficult it is to draw the entire scene 120 times a second.

Footnotes:

[1] Once of the hallmarks of a really good Terrible Car Analogy is a really complicated metaphor.

[2] In this context, “static” is just programmer talk for “doesn’t change”.

[3] Maybe this is a graphics demo with no physics, sound, AI, caching, or networking going on.

[4] I do this when designing the mesh. The rainbow makes it easy to see if triangles are improperly arranged.

[5] Every 16th grid point, actually. Otherwise the terrain feels “jittery” by changing very slightly as you move around.

Shamus Young is a programmer, an author, and nearly a composer. He works on this site full time. If you'd like to support him, you can do so via Patreon or PayPal.

 Things that drive me nuts about OpenGLPrevious Post

Next PostFrontier Rebooted Part 6: Worst-case Scenario 

From The Archives:

50 comments

50 thoughts on “Frontier Rebooted Part 5: Kneel Before LOD”

Zeta Kai says:

Thursday Jun 5, 2014 at 6:48 am

“At the same time, turning up the detail draw distance or model complexity will likely hit the CPU way harder than the CPU.”

I think that second CPU is supposed to be a GPU. Friggin’ TLAs getcha every time.

Reply
Joshua says:

Thursday Jun 5, 2014 at 7:01 am

“you will eventually reach the limit of what the GPU and candle.”

can handle?

Reply
1. ET says:
  
  Thursday Jun 5, 2014 at 9:15 am
  
  Dangit; Ninja’d! ^^;
  
  Reply
2. syal says:
  
  Thursday Jun 5, 2014 at 11:43 am
  
  And that’s not even considering if someone manages to take candle. Then you’re really in trouble.
  
  Reply
  1. MrGuy says:
    
    Thursday Jun 5, 2014 at 2:42 pm
    
    You can’t get ye candle.
    
    Reply
  2. Neil Roy says:
    
    Thursday Jun 5, 2014 at 7:33 pm
    
    You no take candle! ;)
    
    Reply
3. SteveDJ says:
  
  Friday Jun 6, 2014 at 12:55 pm
  
  All these posts on spell/grammar checking, and nobody noticed “Our CPU is the cook is inside.” I’m disappointed in this crowd… :-)
  
  Reply
Robyrt says:

Thursday Jun 5, 2014 at 7:06 am

How about a terrible office analogy?
The boss wants to have a meeting in the conference room, but he wants everybody to bring their computers. So 30 minutes before the meeting, he goes around to everyone’s desk, unplugs their laptop, and brings it to the conference room. “It saves time! My employees don’t have to unplug their laptops for themselves anymore!”

Reply
1. Eruanno says:
  
  Thursday Jun 5, 2014 at 9:47 am
  
  But… there aren’t any cars in this analogy… :(
  
  Reply
  1. Eathanu says:
    
    Thursday Jun 5, 2014 at 1:58 pm
    
    Cars hurt the environment. Do you really want this analogy to bring about Global Analogy Warming?
    
    Reply
  2. Exasperation says:
    
    Thursday Jun 5, 2014 at 3:23 pm
    
    Car analogies are the “rusted out junker up on cinder-blocks in the front yard” of the analogy world. They’ve been around for just about as long as there have been cars, are a huge eyesore that everyone except the ones responsible for them (and sometimes even them) wants to be rid of, don’t accomplish anything useful, will probably never go away, and a family of squirrels has taken up residence in the seat cushions.
    
    Reply
    1. Epopisces says:
      
      Thursday Jun 5, 2014 at 8:03 pm
      
      Car analogy^2. Aka ‘Car Squared’. Aka Nissan Cube?
      
      Also (sorry Shamus!) hidden text #1 needs attention (once = one).
      
      Reply
Abnaxis says:

Thursday Jun 5, 2014 at 7:28 am

On the color map: my understanding is, that branching logic is inefficient on shaders because, no matter what, one of those cores is going to go through the worst case scenario, and have to test every single if/else.

For example, by using a conditional to say “only draw pixel X if it’s in front of the camera” would help if the CPU was doing the drawing, because it would eliminate half the pixels. It doesn’t help the shader, however, because all the cores are already drawing all the pixels in front of the player in parallel, so all the cores that would have drawn pixels behind the player but aren’t because of the test just sit around with their hands in their pockets and wait for the pixels in front. (And yes, I know there are mechanisms for this that do work in parallel, this is just a what-if)

The upshot of all this is: would it really be inefficient to use the shader to draw the color map, since every pixel has to have SOME color, the conditionals are just deciding which one? You aren’t using the conditionals for culling purposes so do they really make a difference? Or is there something else to this optimization I don’t get?

Reply
1. guy says:
  
  Thursday Jun 5, 2014 at 9:54 am
  
  It’s important to remember that processors do not necessarily have identical architecture. GPUs are simply worse at branching than CPUs.
  
  Reply
  1. Abnaxis says:
    
    Thursday Jun 5, 2014 at 10:46 am
    
    But is the architecture really that much worse for branching, or have enough programmers been burned by poorly designed branching shader code, that the conventional wisdom is “doooooooon’t do that!”
    
    Reply
    1. Knut says:
      
      Thursday Jun 5, 2014 at 11:29 am
      
      It’s because the architecture. GPUs are designed for doing maths as fast as possible on as many parallel threads as possible, not conditinal logic. It’s a deliberate design choice.
      
      Reply
    2. guy says:
      
      Thursday Jun 5, 2014 at 11:40 am
      
      GPUs don’t necessarily even have branch prediction, much less good branch prediction. The performance costs resulting from that are pretty steep; if a branch isn’t accurately predicted the core will have to sit idle while the next instruction is fetched.
      
      Reply
      1. nm says:
        
        Thursday Jun 5, 2014 at 4:40 pm
        
        This. A branch misprediction in a simple processor means the entire pipeline gets flushed and nothing comes out until (number of stages in pipeline) cycles later. CPUs, particularly modern superscalar ones, can play all sorts of tricks to make branch mispredictions both less likely and less bad.
        
        Since all that fancy branch prediction logic costs silicon, the GPU makers decide not to include it. That lets them stick more GPU cores on their chips so they can do what they do better and faster and with less power.
        
        Reply
      2. FacelessJ says:
        
        Thursday Jun 5, 2014 at 8:29 pm
        
        I wonder if this is a case where the conditionals could be eliminated entirely.
        
        bools equate to 0 or 1 (In both C++ and GLSL). So one could do something like this:
        
        bool isSteep = gradient > 0.5;
        bool isNearWater = neighbour == water;
        //etc
        
        And then you could have some (admittedly potentially a little bit complex) math for deciding the final colour based on the bools.
        
        Could be something like
        colour = isSteep * steepColour + isNearWater * waterColour;
        where the bools are just picking from predefined colours (Although, if multiple bools are true, this would not work). Probably a more complex equation would be needed, but how you’d set it up would depend on exactly what colours, how many colours and what combinations they can be in.
        
        If Shamus could create a suitable equation for determining the final colour, he could form the colour map without a single branch.
        
        Sidenote: This can also be done in regular CPU code to avoid the cost of branch misprediction (My favourite is for counting elements which match a certain criteria. count += criteria :) ). However, just be aware that you’ll be calculating the then block (and the else block if you have it) everytime. If this is complex (more than about 6-20 cycles of instructions, depending on platform) then it is cheaper to just do the pipeline flush. If only multiplication short-circuiting was a standard thing…
        
        Reply
        
        Decius says:
        
        Thursday Jun 5, 2014 at 10:45 pm
        
        Take all of the binary questions you want, give then each an unique index integer, calculate the value of 2^index, multiply the truth value of the question by that value, and then sum those products.
        
        The sum will be unique to that combination of binary questions.
        
        The process can be duplicated more easily by making each binary question set one bit in a byte/word/other number.
        
        I think the tricky part is answering questions like “Do I have any neighbors that are dirt that aren't touching sand? Do I have two or more neighbors that are dirt that aren't touching sand? Do I have exactly five neighbors that are dirt that aren't touching sand?” (Although questions like “Is the number of neighbors I have touching sand modulo 2 equal to 1?” “Is the number of neighbors I have touching sand modulo 4 equal to 3 or 4” can be the binary propositions in question…
        
        Reply
      3. Someone says:
        
        Friday Jun 6, 2014 at 4:49 am
        
        The problem isn’t really the branch predictor, even though CPUs have much better predictors than GPUs. The problem is that GPUs are made for executing the SAME instruction in parallel.
        
        Each GPU multiprocessor (kind of like a CPU core) is capable of running 8 to 32 instructions simultaneously, as long as those instructions are exactly the same. In other words, a GPU multiprocessor is capable of executing 8 to 32 additions simultaneously. However, if these instructions are different, they will be serialized, meaning that only one of the 8 to 32 cores will be used in each multiprocessor, the others will stall. So, if you try to have half the cores run an addition, while the other half runs a multiplication, the multiplication half will stall while the first half does the addition and vice-versa.
        
        In this context, different instructions means that they are in different lines of the code (i.e. they have different memory addresses). You should think of GPU multiprocessors as multicores that only have a single Program Counter pointer. They can only run one individual instruction at a time, but can run it across multiple data simultaneously.
        
        This is the real problem of branching code: when the execution path diverges, each thread will access different instructions, and the GPU execution will be serialized. You are only using a very small portion of the available computing power.
        
        Reply
        
        Abnaxis says:
        
        Friday Jun 6, 2014 at 7:30 am
        
        I actually got curious enough to go look this up on Google, and found a GPU Gems chapter about it.
        
        It was an interesting read. To me, it sounds like what you’re describing is a SIMD architecture, correct? I don’t know why, but it makes more sense the way you describe it than the article did, even though you’re saying the same thing, so thank you for that.
        
        If I follow correctly, this means that even though every branch executes the same instructions (store byte X at address Y) with different data, they would still be serialized since the instructions reside in different areas of memory. Basically, my understanding that the GPU will always behave according to the worst-case branch is wrong–what actually happens is that, no matter what, the branches always get serialized (at least on SIMD, MIMD doesn’t have that restriction, I guess?).
        
        Still, it strikes me as one of those gut reactions when someone just automatically says “don’t ever branch on the GPU, the CPU will always do it better.” I know nobody in software development wants to spend a single minute more thinking about code than they need to to meet performance requirements, but sometimes all it seems like it would take is a little effort to make code more appropriate for a parallel, if they didn’t automatically throw out the idea without considering it.
        
        This isn’t a dig at Shamus or anything, I just see a lot of programmers out there repeating mantras like these, without considering the context first (see also, “premature optimization is the root of all evil” any time someone talks about improving their code)
        
        Reply
        
        Retsam says:
        
        Friday Jun 6, 2014 at 9:02 am
        
        This is one of the most educational comment threads that I’ve ever read…
        
        Reply
        
        Someone says:
        
        Friday Jun 6, 2014 at 9:05 am
        
        You are right, GPUs are a SIMD architecture (even though NVIDIA likes to call it a SIMT, single instruction multiple threads). And you understand the problem correctly. It is not that branches are inherently inefficient in the GPU, its just that they usually lead each thread to different memory zones, and thus the execution will be serialized, making the GPU worthless.
        
        Please note that branching can have ZERO impact on the GPU’s performance. If all threads of a multiprocessor jump to the same address, there will be no divergence in the execution path, and all the instructions can keep being executed simultaneously.
        
        In general, it is good practice to avoid branching code on GPUs, but one shouldn’t be a fundamentalist about it. If in 50 cycles you loose 2 or 3 because you had to make a small branch, that isn’t really worth the hassle.
        
        Reply
        
        Abnaxis says:
        
        Friday Jun 6, 2014 at 9:23 am
        
        So…to draw it back to the example, if the terrain generated roughly approximates earth and all we’re worried about is “grass, rocky-slope, water, or beach,” ~65-70% of the tiles will be water, ~25-30% of the tiles will be grass, and the rest will be rocky slopes or beach. So out of every ten points I would be shoving at the GPU, 6-7 will jump to one address, 2-3 will jump to another, and the last 1 or so will jump to a third. So it’s about a little less than half-optimal.
        
        Although…again that depends on what order the GPU does things in. If I’m handed a chunk of water, odds are next neighbor is also going to be water. If I were to write code to naively process the color map in parallel, would the multiprocessors normally get chunks of points that are in a close neighborhood of one another, or would I have to assume they’re all just getting uniformly random-access data?
        
        I’m betting that’s one of those questions you never know the answer to for sure, so you have to assume the worst…
        
        Incidentally, thank you for your answers. I’ve been thinking about playing with CUDA, and your information has been very edifying.
        
        Reply
        
        Someone says:
        
        Friday Jun 6, 2014 at 11:50 am
        
        “would the multiprocessors normally get chunks of points that are in a close neighborhood of one another?”
        
        You pretty much have to do that. There is a very big performance penalty if you are not accessing data in close proximity. In earlier architectures, each thread in a multiprocessor would have to access contiguous memory positions, otherwise the memory access would serialize and the cores stall. Fortunately, the newer architectures (from Fermi onwards, I think) have caches and are not so restrictive in the memory access pattern. Nonetheless, the point still stands: you have to feed the threads with data in close proximity if you want any kind of performance. The GPU main memory has very high bandwidth, but a very high latency. You can hide this latency if you access data in close proximity. This is usually called coalescing. Optimizing the memory access pattern is probably THE most important optimization on a GPU. A poorly designed access pattern can turn a 500 core machine into an 8 core machine with really weak cores.
        
        As for the terrain example:
        I would argue that more important than the number of branches is the length of each. As I stated before, the problem with branches is the divergence in the execution path. If each branch path is a single instruction long, then you are introducing a delay of a single cycle per branch (assuming all branches are predicted accurately, obviously). Depending on the application (and the number of branches), this might not be a big performance penalty.
        
        Reply
2. Purple Library Guy says:
  
  Thursday Jun 5, 2014 at 11:30 am
  
  This CPU stuff he described with all the conditionals isn’t for deciding what does and doesn’t get drawn, it’s for deciding what colour everything is. It’s a good idea if the map is already coloured before you turn to look at it, otherwise colouring would have to get done on the fly.
  The point is that what colour a piece of terrain is depends on a bunch of conditions, and it only has to change at the speed of erosion, so it’s OK if the slow but able-to-handle-complexity CPU does it.
  
  Reply
Volfram says:

Thursday Jun 5, 2014 at 8:04 am

Hmm. Terragen Classic used a system where you could set pixel color based on height and slope. This might be able to be done faster than using “if-else” loops. Load a bunch of colors into a lookup array, get the height of a point, divide that by the size of the lookup array, cast to integer, use integer as array index for color.

Slope is a little harder, just because calculating slope is a little harder, but you do basically the same thing. Calculate the angle of the vertex normal from vertical, divide by array size, cast, use as index.

It looks like your color algorithm is basically only making beaches out of the lowest flat areas of the map anyway. It wouldn’t have as many options as the color algorithm you’re using on the CPU, but given my experiences with Terragen, it would be surprisingly flexible and effective.

Reply
WILL says:

Thursday Jun 5, 2014 at 9:20 am

Instead of calculating the type of terrain on the CPU, consider using Tri-Planar procedural mapping in Section 1.5 in this article (which is amazing and you probably already have read).

Again, no reason to bother the CPU with a texturing job – GPUs do it better, faster and easier in almost every case.

Reply
1. Abnaxis says:
  
  Thursday Jun 5, 2014 at 9:29 am
  
  Isn’t tri-planar mapping just about making sure the texture is applied from the correct angle, not so much about choosing which type of terrain texture goes where? Or is there nuance I’m missing? I think Shamus talked about that <a href="I think Shamus talked about that here.
  
  Any time I see a reference to CPU Gems, it make me think of ProcWorld, which I am weofully behind on reading…
  
  Reply
2. ET says:
  
  Thursday Jun 5, 2014 at 9:29 am
  
  That’s not a relevant technique to the problem Shamus is facing. It’s dealing with a pre-existing texture, where the problem is finding texture coordinates to use, so that it looks like a seamless, repeating texture from any angle. Shamus is trying to make the texture, based on what the terrain is, and the coordinates for the texture are the easy part.
  
  Reply
  1. WILL says:
    
    Thursday Jun 5, 2014 at 9:44 am
    
    With some clever algorithms you can generate the texture itself for different materials and then you can pull from world coordinates (which you already have as the heightmap) to determine whether you use snow, rock, sand, etc…
    
    It’s certainly doable, maybe a bit heavy on the vram use, but you save massive amounts of cpu overhead.
    
    Reply
    1. guy says:
      
      Thursday Jun 5, 2014 at 10:08 am
      
      In this case, though, it sounds like the terrain calculation is based on adjacency rather than absolute position.
      
      Reply
  2. Shamus says:
    
    Thursday Jun 5, 2014 at 10:29 am
    
    This is correct.
    
    Reply
Tim Keating says:

Thursday Jun 5, 2014 at 10:50 am

Props for the OMM reference. Damn, I miss that site.

Reply
1. ET says:
  
  Thursday Jun 5, 2014 at 4:53 pm
  
  Flipping through that site reminded me just how old System Shock 2 is. ^^;
  
  Reply
Elec0 says:

Thursday Jun 5, 2014 at 11:47 am

Are you planning on releasing the source at some point to help with the OpenGl shader examples that exist online?

Reply
Ilseroth says:

Thursday Jun 5, 2014 at 12:13 pm

So out of pure horrible sticking to topic… how challenging would it be to apply the knowledge you have gained regarding shaders to Good Robot? Granted fixing that doesn’t deal with your personal qualms with the replay value of the game, but at the very least if the technological issues were less prevalent you may be more willing to put more time into that particular project.

Or were you at all considering utilizing your already constructed animation/mesh importer for the project frontier. You spoke of adding foliage in this post, is your intention to use the same tree generation you had used previously, or to generate something new?

In any case, while I have done a bit of programming, game design has always been my key interest with regards to game development, so while your programming posts are exciting and interesting, when you start getting into building replayability into a game or really introducing any game system I get very interested.

Or hell, if you have had enough with project frontier and/or good robot (I dare say I remember the words “sick to death” or something along those lines) of them, you could always consider using your newfound knowledge on a different project. You could take the viewpoint of “Wasted Time” but in all honesty, learning how to build a simple elegant engine and construct a content pipeline in a reasonable ant timely matter is probably something major developers should stop and do at some point.

Reply
Daemian Lucifer says:

Thursday Jun 5, 2014 at 2:18 pm

Dont know if youve notice,but judging by that sctreenshot in that tweet,you seem to have files waiting to be burned to disc.You should maybe check that out.

Reply
Alan says:

Thursday Jun 5, 2014 at 3:46 pm

Why is FSAA basically free for a CPU? I’d expect it to be pretty expensive for a CPU. I suspect I’m overlooking something.

Reply
1. Shamus says:
  
  Thursday Jun 5, 2014 at 4:36 pm
  
  As I understand it, FSAA you just render to a texture (instead of to a viewport) then render that texture to the viewport. So you’re rendering one big quad. The expense is the shader that you use when rendering this one quad.
  
  Reply
2. Roger HÃ¥gensen says:
  
  Friday Jun 6, 2014 at 5:42 am
  
  Anti-aliasing in it’s basic form (which is also the most processing intensive one) is to render everything twice, trice, four (or more times) the resolution.
  
  Simply explained if the viewport is at 1920×1080 and you use 4x SSAA the scene is rendered with a viewport of 7680×4320 and is then re-scaled/downsampled down to 1920×1080.
  
  With the new 4K and 8K monitors you can just set a higher screen resolution and avoid the downsampling stage, but you need more bandwidth from the GPU to the display in that case.
  
  If you have a powerful GPU the advantage of anti-aliasing is that thereis no extra cost of bandwidth (from CPU to GPU nor from GPU to display).
  
  You can manually do this form of anti-alising by taking a high resolution image and resize it down (using a a decent quality resizer which most image software should have) to say 1920×1080 if that is what your monitor is.
  
  Since rendering at a higher resolution and doing the resizing is expensive to do there are other ways to do anti-aliasing.
  SMAA and FXAA is a method that uses a shader to detect edges and smooth/blur them instead of rendering things “bigger”.
  
  Maybe we’ll see Shamus mess with that (since he’s doing shader focused stuff).
  
  http://blog.codinghorror.com/fast-approximate-anti-aliasing-fxaa/
  
  And the (said to be even better)
  http://www.iryoku.com/smaa/
  
  Both URLs link to the shader sources ad documentatiob.
  FXAA was invented by the guys at Nvidia, and SMAA by the guys at Crytek.
  
  Reply
Piflik says:

Thursday Jun 5, 2014 at 3:56 pm

“Yes, technically we could probably offload this job to the GPU. But the GPU is really inefficient at branching logic, so a lot of power would go to waste.”

You could technically do this without any branching in a PixelShader using the Height Map and the Normal Map. The blue channel of the Normal Map is quite equivalent to the terrains’ slope and you can easily calculate a blend weight to blend two textures/colors together. I did this last year at university for a DirectX project (so the shader is HLSL), since I didn’t want to have a huge color map on my procedural terrain (simple Diamond-Square), but also didn’t like the bad resolution, so I did the tiling on the GPU and calculated the weights accordingly.

Posting the relevant code here, if anyone is interested, but it is quite trivial, if you know the blend equation for 4 different alphas XD

(the vertex shader also has some coordinate transformations to bend the terrain into a sphere, that’s the reason for the seemingly strange height calculation at the top. Didn’t have the heightmap, since the height was stored in the vertex buffer and the Pixelshader’s input didn’t have a value for the height…)

Reply
1. Volfram says:
  
  Friday Jun 6, 2014 at 10:32 am
  
  That’s an even faster way of doing what I suggested above, with all of the advantages plus a couple more.
  
  Cool!
  
  Reply
Taellosse says:

Thursday Jun 5, 2014 at 8:37 pm

This is not directly on topic, but a weird little quirk I’ve noticed that I wanted to ask about. Those inline footnotes you’ve taken to putting in your posts (which are neat, by the way – much better than normal footnotes since they require no scrolling) behave oddly in the RSS-feed version of them (or, at least, they do in Feedly). They appear normally, but with the wrong number – usually starting with a number between 3 and 5, then proceeding in order from there. They also display the footnote text with just a mouseover, instead of a click, but that’s not important. Any idea why that happens?

Reply
The Schwarz says:

Thursday Jun 5, 2014 at 11:47 pm

Oh LOD what a terrible pun.

Reply
Caffiene says:

Friday Jun 6, 2014 at 5:13 am

“Is one of my neighbors water? Is one of them rock? Am I rock? Do I have any neighbors that are dirt that aren't touching sand? Gah. Doing that kind of logic with a shader would be horrendous.”

I could be missing something obvious (and probably am… Its friday night here), but I remember a quite elegant 2d solution posted (or linked) on this site that used bit addition to decide on suitable tiles for a platformer level based on the surrounding tiles. Is there a reason that a version of this would work as a shader?

It ‘seems’ possible, but I have the feeling Im forgetting something I should know.

Reply
Paul Spooner says:

Friday Jun 6, 2014 at 9:13 am

Yay! More coding posts!
I might be missing something, but wouldn’t it be easy to calculate the color map while doing the erosion? You’ve already got all the variables loaded, height, slope, neighbor’s height and slope…
Then again, all I know about shaders is what I’ve read in this series, so you undoubtedly know what you’re doing.
Glad to see the coding posts continue!

Reply
Owen Shepherd says:

Friday Jun 6, 2014 at 4:23 pm

You increased the tesselation of the center of your mesh, manually, hence producing as imple LOD system.

So, while you’re having shader fun… Have you thought about messing with the (GL 4/DX 11) tessellation shaders?

There are two stages in play, and a little fixed function block in between. The first stage (tessellation control shader) receives primitives (say, GL_QUADs – they disappeared in GL3, then GL4 has brought them back exclusively for the tessellator input) and basically has the job of determining how many times to tessellate each primitive. The second stage (tessellation evaluation shader) is ran for each /output/ primitive, and is responsible for deciding how to tessellate it. You can probably find some good examples of TCS/TCS shaders online.

You could then, of course, get dynamic tessellation whereby a cliff right in front of your face might have triangles only a few pixels across.

Reply
default_ex says:

Sunday Jun 8, 2014 at 12:33 pm

I’ve had great success with using focal calculations similar to what a modern camera uses to find focal points where detail tops out. What I found experimenting with that is that as long as enough detail is focused in the intersecting points of a Fibonacci curve, the player’s eyes will make up the difference. By intersecting points I refer to the squares one draws to form the vertices for a Fibonacci curve, a minimum of 12 are requires for your average 20ish inch monitor with the player sitting roughly 2ft from it.

Reply

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

T w e n t y S i d e d