Attention game reviewers: Entertain me! Dance for my amusement! Say something witty! Engage in frivolous japery to wring a smile from my stony visage.
No? Then sod off.
Attention game reviewers: Entertain me! Dance for my amusement! Say something witty! Engage in frivolous japery to wring a smile from my stony visage.
No? Then sod off.
People keep telling me they enjoy this series, but they can’t possibly be enjoying it as much as I am. It’s nice to have material that’s written ahead of time and to know I’ll be able to get the shots I need from the game. I seem to be coming to some sort of truce with Garry’s mod. Perhaps someday I’ll attain the lofty goal of becoming “competent” with the thing.
More Performance Enhancements
The final tally: The program is taking up my entire 1920×1200 desktop and runs at over 200fps. It goes down to ~90 when bloom lighting is on and becomes very fill-rate sensitive (finally! it’s nice to know the GPU has some sort of limits) but I’m happy enough with performance now.
The biggest drain on performance now is bloom lighting, which is optional. I can really make bloom work twice as fast, but make it look a little less appealing. In fact, it’s only now that I’m questioning if I’m doing bloom “right”.
A Bit About Bloom
I’ve never looked up how other programs generate bloom lighting. I saw bloom in action for the first time when I played Deus Ex 2. I looked at the blobby artifacts it created and made some assumptions about how it was done. Here is how I thought I worked:
(Note that the example bloom I’m using here is a simulated effect, generated in Paint Shop Pro. The artifact I want to show you is really visible in motion, but actually kind of subtle in a still frame, so I’m exaggerating the effect for demonstration purposes.)
This is the full view of the scene at a given instant during the run of the program:
| The city, without bloom. |
But before I render that, I render a lower-resolution version of the scene into a texture:
| The bloom buffer, if you will. It’s actually not THIS pixelated, but you get the idea. |
After I have a low-res version of the scene saved in a little texture, I render the real, full screen deal for the viewer. (The first image.) Once I have that rendered, I take the pixelated bloom buffer and blend it with the scene using a “brighten filter”. People using Photoshop call it the “Screen” effect, I believe. OpenGL programmers call it glBlendFunc (GL_ONE, GL_ONE);.
| Split image. On the left side, the bloom is pixelated. On the right, it’s smooth. |
What you end up with is the left half of the above image: big glowy rectangles over bright areas of the scene, instead of blurry circular blobs of light. Again, it looks fine in stillframe, but when the camera is flying around the scene the rectangles tend to stick out.
But you can soften the rectangles out by rendering the bloom layer more than once, and displacing it slightly each time. This fuzzes out the edges of the rectangle and gives it the dreamy glow I want. It also means blending a huge, screen-covering polygon multiple times. As I mentioned in my previous post, blending is (or can be) slow.
Is this how games do bloom? I don’t know. (Actually, modern games will invariably be using pixel shaders to achieve this effect, which will be faster and probably let them soften those edges in a single draw. But I’m bull-headed and insist on building this thing using stone knives and bearskins.) At any rate, there is an annoying but unsurprising trade off between performance and quality going on here, and the “sweet spot” between the two is guaranteed to be very different for different hardware. Bloom way well be unusably slow just two GPU generations ago. All I can do is give users a way to turn it off and see how it goes.
Feedback & Performance Suggestions
Some people made some great suggestions on how the program could be further sped up. The use of vertex buffers was suggested. A vertex buffer lets the program gather up all of the geometry in the world and stick it right on the graphics card. (Assuming the graphics card has the memory for it, which isn’t a problem in my case.) This would eliminate the most expensive thing my program has to do, which is shoveling geometry to the GPU as fast as it can. However, I’m pretty sure integrated graphics cards don’t support them. (Actually, data on what hardware supports them and what doesn’t is as scarce as copies of Duke Nukem Forever.) Since I can’t count on all users having VB support, I’d have to use two rendering paths: The one I have now for people with low-end hardware, and the super-duper fast one for people with newer hardware. This means people with great hardware will go even faster still, and people with lower hardware will have no benefit. Adding a bunch of code to push my framerate up to 500fps while doing nothing to help the guy chugging along at 20fps is a brilliant way to expend effort in such a way so as to not benefit anyone. So let’s skip that. (This is assuming I’m right that there are computers less than five years old that don’t support VB’s.)
Someone else suggested that a quadtree would be an ideal way to handle the culling I was doing in my last post. It’s tempting, but that would be a lot of work and I’ve already hit my vaguely defined goal.
I know I promised a wrap-up to this series this week, but the tiresome impositions of my mortal form have frustrated my progress. Also I played a bunch of Left 4 Dead. But I am actually done working on the thing. I just need to write about the final steps and release it. My plan is to release it two ways:
1) The source and project files, so others may tinker or point and laugh.
2) As a Windows screensaver.
Next time: Finishing touches.
Insomnia is an interesting malady. It’s pretty common. (64 million Americans suffer from it on a “regular basis” each year. Which works out to, what? 1 in 5? Something like that.) I find it interesting because I get it pretty frequently, and it doesn’t seem to have any of the usual triggers associated with it. Yesterday was almost exactly the same as the day before. I woke at the same time, worked the same hours, knocked off work, spent time with the family, wrote on my blog, and played Left 4 Dead. It was a pretty good day, so I didn’t mind having it twice in a row. I even ate the same foods. No change in stress levels. (The only difference is that I was out for a routine dentist appointment yesterday, and I can’t imagine how that could have any effect.) But last night I never went to bed. The sun is up. Birds chirping. Still not sleepy.
I’d like to think this means I have some special, exotic case of insomnia. But the truth is that we probably just don’t understand it very well. I’ll bet a lot of insomnia like mine gets blamed on “stress”. It’s pretty hard to get through a day on this planet without a little stress at some point. You had stress? Oh, that must be what caused your insomnia.
Right now I feel like an old laptop that’s been left on and left unplugged. I’m idle now, but at some point the battery is going to fail and I’ll crash.
I didn’t finish a post for today, but I made this for some reason:
![]() |
Brains are weird.
Time to optimize the program. If you’ll recall, it’s running at (maybe) 30fps. I want it running at 100fps or better. The big thing that needs to be done is to take a lot of the weight off of the CPU. It’s thrashing through 4,000+ buildings every time it draws the scene, gathering up polygons and sending them to the renderer. Let’s see if we can’t cull those numbers.
The most obvious way to take some of the load off of the CPU is to not attempt to draw stuff that’s out-of-view. Right now when I’m rendering, I go through the list of 3,000 buildings and cram all of the vertex and polygon data into OpenGL. It then takes each and every vertex and runs it through a bunch of mathematical mumbo-jumbo to figure out the answer to the question: Where – given the current location and orientation of the camera – will this particular building end up on-screen? Nowhere? Oh? This building is behind the camera? Oh well. Chuck it. Let’s see the next one.
This is a very expensive form of math, and it’s all being done by the CPU. I could use vertex shaders to offload most of this work onto the GPU. (A few years ago you used to hear about stuff supporting “Hardware Transform & Lighting”. This is what that term is all about.) I might resort to that if I become desperate, but adding shaders is a lot of new code and complexity. The shaders themselves are pretty simple, and T&L shaders are usually given away as ten-line example programs. But to get access to shaders requires a lot of boilerplate code and doing a little meet & greet with the system hardware to make sure it’s down with Transform & Lighting. I’d really like to avoid doing that if I can get the job done with the tools I’m already using.
With a 90 degree viewing angle, the camera isn’t going to see more than 1/4 of the stuff around it. But the program is doing matrix math on everything before it gets to the point where OpenGL realizes I’ve been wasting its time.
There are a lot of techniques you can use to figure out what’s in view, but I want something simple to code and easy to manage. I have a lot of flexibility in my design and not much in my timetable, so it’s better to go for a rough system that solves 90% of the problem than for a robust and complex one that solves 98% of it.
Observation: The city data is more or less 2-dimensional. I can simply look at the angle to the object from the camera position on the horizontal plane. If it’s more than 45 degrees to the left or right, then it’s out of the 90 degree cone of stuff we can see, and it can be discarded before we go bothering OpenGL with it. But, in order to get an angle…
Let’s look at the code I use to get an angle from A to B on a 2d plane. Let’s see… a bunch of floating-point math. Some more floating point math. And an arctangent operation. Crimey. Finally there are some annoying comparisons and faffing about to make sure the angle stays between 0 and 360. This is not something you want to do to 4,000 objects in the scene. (Actually, that’s how many buildings there are. The total count of objects eligible for a ride through the OpenGL funhouse is over 9,000 once you take into account street lights and cars and some other bits.)
Doing an angle check to an object would possibly be faster than cramming it through OpenGL, although that might not be true in all cases. It might be faster to simply draw a one-polygon car than to do a check on all and then draw the genuinely visible ones. But even in a best-case scenario, it would still be giving the CPU a brutal workout while the GPU naps on the couch. To alleviate this, I divide the world into regions:
| Overhead view. The world is divided up on this grid. The red dot is the camera, and the blue is the cone that will be visible to the viewer. |
Now instead of doing an angle check on all of those thousands of objects, I’ll just do the check on my 256 regions to see which ones are in view. At render time, I go over the visible regions and render their contents, ignoring everything else.
| Objects inside of the green regions will be drawn. Everything else will be ignored. |
This is not without its drawbacks.
1) Buildings sometimes straddle the line between regions, meaning the center of the building might be sitting in a red (hidden) region but a bit of it is poking into a green (visible) region. It would mean that you’d see an empty lot on the edge of the screen, but as you turned to look a building would pop in out of nowhere. This is very noticeable and ugly. Identifying and dealing with these cases can get very tricky.
Solution: I widened the assumed viewing angle to 100 degrees. This means I’ll be rendering (or attempting to render) a lot of extraneous crap just to catch a few difficult objects. If I’m still having speed problems later, this is one thing I can revisit and try to improve.
2) Doing these calculations on a 2D plane is fast, but it assumes you’re looking straight ahead, level. Tilting the view downward will bring nearby red regions into view. As I look down and pan left and right I can see buildings popping in and out of view. That wouldn’t be bad, except that looking down on the city is pretty much the entire point of the project.
Solution: I mark all adjacent regions visible. No big deal. It’s only a few dozen objects. Doing the math to truly sort this stuff out wouldn’t be worth it. I doubt I could even measure the performance improvement from a dozen stragglers.
3) I’m drawing a lot of extra buildings. As soon as the corner of a region enters view, everything in that region gets drawn. I can make the regions smaller, and thus more accurate. This would cull out a lot of stuff that’s still way out of view, at the expense of giving me more angle checks to perform. Right now the world is divided into a grid of 16×16 regions, for a total of 256. If I made the world (say) 32×32 regions, I’d have 1,024 checks to do. I’d have twice the accuracy at the expense of doing four times as many calculations. I could also move to a 8×8 grid and do one-fourth of the calculations, at the expense of having even more overdraw. Hm.
This is a fiendish one, because the sweet spot – the optimal spot – is somewhere in the middle. A 2×2 grid would be worthless and cull no objects. A 128×128 grid would require 16,384 angle checks, which is actually greater than the number of objects in the scene. That’s worse than doing the angle check on each and every building individually. I want the region size to be big enough that every region will contain more than one building, but not so big that I’ll be attempting to render a lot of off-to-the-side stuff. (Note also that this interacts with the texture-sorting I talked about last time. If I’m rendering by region, I can no longer render all of the same-textured objects together. Smaller regions mean more texture switching. Sorting them before each rendering pass would fix this by having the CPU spend a lot more effort in order to save the GPU a little effort, which is the very opposite of what I want to do.)
Solution: Lots of tests later, and it is apparent that my initial choice of a 16×16 grid was already the optimal one. There are reasons why the optimal setting might be different on a different machine. I must not think about this or it will drive me crazy.
So far, I’ve worked the framerate up to 130-ish. I’ve technically met my goal, although framerate is still under 100 when the bloom effect is turned on.
I’m sure I can do better than that. I still have another round of optimizations to make. Ideally, I’d like to get it working so that it’s running at over 100fps even when bloom is on. We’ll see how the next round of changes go.
I’m sure some people will notice that I re-worked the sky to be a single gradient, added some simple radio towers on top of buildings, and added some other bits of sparkle power. None of it is set in stone. These were just experimental visual changes I made over the weekend and not related to any of this optimization work. I’ll have time to fuss with that stuff once I have the rendering working smoothly.
Blah blah Left 4 Dead, yadda yadda read the comic blah blah.
Anyway, The Escapist is up for a Webby award. I’d love it if they won. Not just because it’s a great site, but because I’d like to see more support for this sort of content in general: Grown up, interested in fun, not given over to adolescent posturing, fanboyism, and obsession over review scores. And of course there are my own efforts.
I know it’s annoying to vote. You must register, and navigating the Webby site is not something I would categorize as a rewarding experience. But if you have the inclination your efforts would be a much appreciated show of support for the sort of stuff I do over there.
The project is now officially over budget. I expended vast quantities of time obsessing over framerates and benchmarking this weekend. My budget was about 30 hours, and while I haven’t been keeping a rigorous tally of hours, I’m well past the deadline and not yet done. But the end is in sight. I’ve determined to get this thing done this week. All told, it looks like I’ll have sunk 40 hours into it. For perspective, if this was a game with an eighteen month budget, I would just have missed going gold and admitted that we were going to need another six months. And that our entire staff spent six months of the budget playing Left 4 Dead. Good thing I don’t have investors.
I’m afraid this stretch of the project is likely to be a bit dry. It can’t all be colored pixels and bloom lighting. Sometimes I have to go and fuss over dull numbers and time things, which makes for unspectacular screenshots. I’ll do my best to make this interesting.
By now the program is running like a slow pig, and has been for a while. 30FPS for a little city like this is appallingly slow, and before I move on to grander things I need to know how it’s going to run.
The first step in speeding up a program like this is finding out where the bottlenecks are. There are several aspects of rendering that I look into when facing slowdowns:
An analogy I like: The CPU is a waitress taking orders. The throughput is the rate at which she can put order strips up to be read by the cook, who is the GPU in this case. Fill rate problems mean he’s not cooking stuff fast enough. (Yes, I know they’re called servers today, and not “waitresses”. But the restaurant used in this analogy is a roadside greasy spoon diner in 1978. She doesn’t care if you call her a waitress, as long as you tip well and the kids don’t make too much noise.)
The third type of slowdown is easy to spot. If making the window smaller speeds things up, then it’s a fill rate problem. If not, then it’s probably a CPU problem. I’m sure I’m not having a fill-rate problem, but I check anyway because you don’t begin research by assuming you know everything.
I shrink the window. No change in the framerate. I shrink it to almost nothing. Still no improvement. Just for fun, I turn off writing to the z-buffer and change the program to draw the buildings as translucent. This will make all of those polygons many, many more times expensive and will ensure that the GPU has to draw every. single. one. Then I make the program run full-screen.
Take that, fancy-pants hardware! Let’s see how you like choking on 40,000 two-sided alpha-blended, screen-filling polygons!
| Hm. That actually looks kind of cool. |
No change.
Wow.
Not only is my graphics hardware not the bottleneck (which I already suspected) but it’s not even being reasonably challenged. Going back to the waitress analogy, here we have a cook that can prepare meals faster than the waitress can write them down. She writes down a four-order meal with appetizers and desserts, and the food is done before she can get back out on the floor to take another order.
As someone pointed out earlier in the series, these new cards are designed for rendering with complex pixel shaders that do fancy bump mapping, specular mapping, texture layering, dynamic light passes, lighting objects at grazing angles, and a whole bunch of other intense math. On every single pixel it draws. Here I’m simply asking it to take one lone texture map and apply it to the polygon, and I doubt I could hope to keep the thing busy with such a lightweight job.
Actually, tests reveal there is one thing it’s slightly sensitive to, which is changing textures. Back in step one I made texture maps for the buildings. Think of rendering as painting. If I ask you to paint a red stroke on the canvas, then a blue one, then a red one, it will take longer than it would to do both red strokes and then the blue. It takes a moment to lower your brush, clean it off, load it up with paint again, and bring it back up to the canvas. I can get a small performance boost by making sure I render all of the buildings that share a common texture at the same time. With eight textures and (roughly) 3,000 buildings, rendering them in a random order will cause the graphics card to have to change paint over 2,500 times. If I sort the buildings, it will only have to do so 8 times. This gives me a modest performance boost of around 10fps. That’s nice, but it’s trivial compared to the real optimizations I’ll need to do. I should have gone after inefficiencies like this one later on, and go after the low-hanging fruit first. But I did this one more or less by accident as part of my tests.
(Note that I’m writing this after the fact, and I didn’t keep a perfect record of how much of a performance boost I got from each step. The numbers I give for framerates are vague recollections and guesses.)
Here is a 13 part series where I talk about programming games, programming languages, and programming problems.
This series began as a cheap little 2D overhead game and grew into the most profitable entertainment product ever made. I have a love / hate relationship with the series.
Here is a long look at a game that tries to live up to a big legacy and fails hilariously.
What is this Vulkan stuff? A graphics engine? A game engine? A new flavor of breakfast cereal? And how is it supposed to make PC games better?
Valve still hasn't admitted it, but the Half-Life franchise is dead. So what made these games so popular anyway?
I'm not surprised a fighting game has an absurd story. I just can't figure out why they bothered with the story at all.
What are publishers doing to fight piracy and why is it all wrong?
It seems like a simple question, but it turns out everyone has a different idea of right and wrong in the digital world.
Here are 6 reasons why I forbid political discussions on this site. #4 will amaze you. Or not.
As someone who loves Tolkein lore and despises silly MMO quests, this game left me deeply conflicted.