|By Shamus||Apr 30, 2009||61 comments|
More Performance Enhancements
- I put more blank space between the buildings, gave the existing buildings a slightly larger footprint, and made the streets wider. Surprisingly, this ended up cutting about 25% of the buildings in the scene, which actually looked better.
- I merged the individual street lights into strips, like Christmas lights. This means all the lights on the same side of the street are actually one big polygon. This means streetlights look goofy if you get too low, but eliminated thousands and thousands of polygons from the scene. I doubt I’ll ever be really happy with how the streets look, but I think I at least have it to the point where I can live with it.
- I added a widescreen / letterbox mode where the drawn area will only take up part of the screen space. This saved a not-at-all-impressive 2fps on my machine, but I’m hoping it will make the thing run better on low-end graphics cards. Users with fill-rate problems should see improvement when they use it.
- I limit the cars to ten updates per second. The car movement code is fairly cheap, but there will potentially be a lot of cars in the scene and after few hundred iterations it starts to add up. Limiting the cars to 10fps means that their movement would look a bit stuttering up close. But since they creep along far away from the camera, there is no reason to waste time calculating each and every tiny little movement, most of which will be too minute to be visible to the user.
- Cars are limited in how far away you can see them. After a car gets a certain distance from the camera, it’s taken out of play and dropped somewhere into the visible area in front of the camera. This keeps the apparent car density high.
- Related to the above, the “distance” of the car is calculated in manhattan units. (I can’t find a definition of manhattan units anywhere online. I know I’m not the only person to use this term. I wonder what other people call it?) MU are a distance check that involves calculating the distance between A and B by simply adding the horizontal and vertical offset. If you’re a kilometer east and a kilometer south, then you’d walk about 1.4 kilometers on the diagonal but you’re 2 kilometers away in manhattan units. A real distance check calculates how from from A to B if you were flying. A manhattan check (do you still capitalize it when it’s the name of a unit and not a place?) figures out how far you would travel if you went from A to B by traveling only along one axis at a time. A real distance check requires two multiply operations, an add operation, and then a square root. Manhattan units require a simple addition operation. Using MU for visibility means that you can see farther looking down a grid-aligned street than you can by looking diagonally over the city. But this is ideal, since buildings are usually blocking the diagonal view. The cases where the user might see through a gap in the buildings several diagonal blocks away are rare and fleeting, and the user is not likely to notice a lack of cars. The performance boost from this is probably too small to measure, but I loved finding a case where I could shamelessly cut corners with no perceptible loss in quality.
- I fixed a bug where I was rendering all of the black flat-color stuff twice. I was rendering it once with the rest of a building, and with a texture on it. (Although since I wasn’t setting texture coordinates, it was just taking a single pixel and smearing it all over the object.) So it was being drawn once with a pure black section of a texture covering it, and then drawn again later as black polygons with no texture. I’m sure I don’t have to explain why this mistake wasn’t visible while the program was running.
The final tally: The program is taking up my entire 1920×1200 desktop and runs at over 200fps. It goes down to ~90 when bloom lighting is on and becomes very fill-rate sensitive (finally! it’s nice to know the GPU has some sort of limits) but I’m happy enough with performance now.
The biggest drain on performance now is bloom lighting, which is optional. I can really make bloom work twice as fast, but make it look a little less appealing. In fact, it’s only now that I’m questioning if I’m doing bloom “right”.
A Bit About Bloom
I’ve never looked up how other programs generate bloom lighting. I saw bloom in action for the first time when I played Deus Ex 2. I looked at the blobby artifacts it created and made some assumptions about how it was done. Here is how I thought I worked:
(Note that the example bloom I’m using here is a simulated effect, generated in Paint Shop Pro. The artifact I want to show you is really visible in motion, but actually kind of subtle in a still frame, so I’m exaggerating the effect for demonstration purposes.)
This is the full view of the scene at a given instant during the run of the program:
|The city, without bloom.|
But before I render that, I render a lower-resolution version of the scene into a texture:
|The bloom buffer, if you will. It's actually not THIS pixelated, but you get the idea.|
After I have a low-res version of the scene saved in a little texture, I render the real, full screen deal for the viewer. (The first image.) Once I have that rendered, I take the pixelated bloom buffer and blend it with the scene using a “brighten filter”. People using Photoshop call it the “Screen” effect, I believe. OpenGL programmers call it glBlendFunc (GL_ONE, GL_ONE);.
|Split image. On the left side, the bloom is pixelated. On the right, it's smooth.|
What you end up with is the left half of the above image: big glowy rectangles over bright areas of the scene, instead of blurry circular blobs of light. Again, it looks fine in stillframe, but when the camera is flying around the scene the rectangles tend to stick out.
But you can soften the rectangles out by rendering the bloom layer more than once, and displacing it slightly each time. This fuzzes out the edges of the rectangle and gives it the dreamy glow I want. It also means blending a huge, screen-covering polygon multiple times. As I mentioned in my previous post, blending is (or can be) slow.
Is this how games do bloom? I don’t know. (Actually, modern games will invariably be using pixel shaders to achieve this effect, which will be faster and probably let them soften those edges in a single draw. But I’m bull-headed and insist on building this thing using stone knives and bearskins.) At any rate, there is an annoying but unsurprising trade off between performance and quality going on here, and the “sweet spot” between the two is guaranteed to be very different for different hardware. Bloom way well be unusably slow just two GPU generations ago. All I can do is give users a way to turn it off and see how it goes.
Feedback & Performance Suggestions
Some people made some great suggestions on how the program could be further sped up. The use of vertex buffers was suggested. A vertex buffer lets the program gather up all of the geometry in the world and stick it right on the graphics card. (Assuming the graphics card has the memory for it, which isn’t a problem in my case.) This would eliminate the most expensive thing my program has to do, which is shoveling geometry to the GPU as fast as it can. However, I’m pretty sure integrated graphics cards don’t support them. (Actually, data on what hardware supports them and what doesn’t is as scarce as copies of Duke Nukem Forever.) Since I can’t count on all users having VB support, I’d have to use two rendering paths: The one I have now for people with low-end hardware, and the super-duper fast one for people with newer hardware. This means people with great hardware will go even faster still, and people with lower hardware will have no benefit. Adding a bunch of code to push my framerate up to 500fps while doing nothing to help the guy chugging along at 20fps is a brilliant way to expend effort in such a way so as to not benefit anyone. So let’s skip that. (This is assuming I’m right that there are computers less than five years old that don’t support VB’s.)
Someone else suggested that a quadtree would be an ideal way to handle the culling I was doing in my last post. It’s tempting, but that would be a lot of work and I’ve already hit my vaguely defined goal.
I know I promised a wrap-up to this series this week, but the tiresome impositions of my mortal form have frustrated my progress. Also I played a bunch of Left 4 Dead. But I am actually done working on the thing. I just need to write about the final steps and release it. My plan is to release it two ways:
1) The source and project files, so others may tinker or point and laugh.
2) As a Windows screensaver.
Next time: Finishing touches.