|By Shamus||Apr 27, 2009||27 comments|
The project is now officially over budget. I expended vast quantities of time obsessing over framerates and benchmarking this weekend. My budget was about 30 hours, and while I haven’t been keeping a rigorous tally of hours, I’m well past the deadline and not yet done. But the end is in sight. I’ve determined to get this thing done this week. All told, it looks like I’ll have sunk 40 hours into it. For perspective, if this was a game with an eighteen month budget, I would just have missed going gold and admitted that we were going to need another six months. And that our entire staff spent six months of the budget playing Left 4 Dead. Good thing I don’t have investors.
I’m afraid this stretch of the project is likely to be a bit dry. It can’t all be colored pixels and bloom lighting. Sometimes I have to go and fuss over dull numbers and time things, which makes for unspectacular screenshots. I’ll do my best to make this interesting.
By now the program is running like a slow pig, and has been for a while. 30FPS for a little city like this is appallingly slow, and before I move on to grander things I need to know how it’s going to run.
The first step in speeding up a program like this is finding out where the bottlenecks are. There are several aspects of rendering that I look into when facing slowdowns:
- CPU bottleneck: The program just isn’t running fast enough and so it isn’t sending polygons to the GPU fast enough.
- Throughput bottleneck: The software is capable of sending the polygons faster, and the GPU is capable of rendering them faster, but because of drivers or hardware limitations you just can’t send the data fast enough. You’ll run into problems like this if you’re trying to render a whole bunch of lightweight, easy-to-draw polygons. (With “bunch” in this case meaning “hundreds of thousands” or even “millions”.) If you’re drawing a bunch of tiny little triangles with no special lighting effects or texturing, (what the heck are you drawing, anyway?) you can run into this bottleneck. I don’t encounter this very often due to the fact that I’m usually working on the lower end of the tech curve. (Although I’m pretty sure I ran into it during the terrain project. I can’t remember now.) In any case, I understand this bottleneck is one of the reasons for the move from PCI to AGP, and from AGP to PCIe slot interfaces. The newer bus interface lets you pump more data to the GPU. I have… wait, let me write some code to do a count…. Okay: 40,804 polygons. That’s not counting a few special case groups like the cars, ground or sky, but that number is probably within 1,000 polygons of the total. So I don’t think I have to worry about throughput on anything built in this century.
- Fill-rate bottleneck: This has always been where I do most of my work. Problems like this usually come from filling the screen with expensive polygons. In contrast to the earlier case where I might have been drawing millions of cheap polygons, perhaps I’m slowing things down by drawing just a few expensive ones. Fifty polygons sounds pretty pathetic compared to a million, but if all fifty polygons cover the entire screen and do complex blending, then you’ll end up with a fill rate problem. Your program is sending those 50 polygons nice and fast, and there are so few they won’t clog up the hardware pipeline, but the graphics card is going to take a long time to draw them. Full-screen effects (like bloom lighting and depth-of-field) can cause fill rate problems.
An analogy I like: The CPU is a waitress taking orders. The throughput is the rate at which she can put order strips up to be read by the cook, who is the GPU in this case. Fill rate problems mean he’s not cooking stuff fast enough. (Yes, I know they’re called servers today, and not “waitresses”. But the restaurant used in this analogy is a roadside greasy spoon diner in 1978. She doesn’t care if you call her a waitress, as long as you tip well and the kids don’t make too much noise.)
The third type of slowdown is easy to spot. If making the window smaller speeds things up, then it’s a fill rate problem. If not, then it’s probably a CPU problem. I’m sure I’m not having a fill-rate problem, but I check anyway because you don’t begin research by assuming you know everything.
I shrink the window. No change in the framerate. I shrink it to almost nothing. Still no improvement. Just for fun, I turn off writing to the z-buffer and change the program to draw the buildings as translucent. This will make all of those polygons many, many more times expensive and will ensure that the GPU has to draw every. single. one. Then I make the program run full-screen.
Take that, fancy-pants hardware! Let’s see how you like choking on 40,000 two-sided alpha-blended, screen-filling polygons!
|Hm. That actually looks kind of cool.|
Not only is my graphics hardware not the bottleneck (which I already suspected) but it’s not even being reasonably challenged. Going back to the waitress analogy, here we have a cook that can prepare meals faster than the waitress can write them down. She writes down a four-order meal with appetizers and desserts, and the food is done before she can get back out on the floor to take another order.
As someone pointed out earlier in the series, these new cards are designed for rendering with complex pixel shaders that do fancy bump mapping, specular mapping, texture layering, dynamic light passes, lighting objects at grazing angles, and a whole bunch of other intense math. On every single pixel it draws. Here I’m simply asking it to take one lone texture map and apply it to the polygon, and I doubt I could hope to keep the thing busy with such a lightweight job.
Actually, tests reveal there is one thing it’s slightly sensitive to, which is changing textures. Back in step one I made texture maps for the buildings. Think of rendering as painting. If I ask you to paint a red stroke on the canvas, then a blue one, then a red one, it will take longer than it would to do both red strokes and then the blue. It takes a moment to lower your brush, clean it off, load it up with paint again, and bring it back up to the canvas. I can get a small performance boost by making sure I render all of the buildings that share a common texture at the same time. With eight textures and (roughly) 3,000 buildings, rendering them in a random order will cause the graphics card to have to change paint over 2,500 times. If I sort the buildings, it will only have to do so 8 times. This gives me a modest performance boost of around 10fps. That’s nice, but it’s trivial compared to the real optimizations I’ll need to do. I should have gone after inefficiencies like this one later on, and go after the low-hanging fruit first. But I did this one more or less by accident as part of my tests.
(Note that I’m writing this after the fact, and I didn’t keep a perfect record of how much of a performance boost I got from each step. The numbers I give for framerates are vague recollections and guesses.)