|By Shamus||Apr 28, 2009||Programming, Projects||51 comments|
Time to optimize the program. If you’ll recall, it’s running at (maybe) 30fps. I want it running at 100fps or better. The big thing that needs to be done is to take a lot of the weight off of the CPU. It’s thrashing through 4,000+ buildings every time it draws the scene, gathering up polygons and sending them to the renderer. Let’s see if we can’t cull those numbers.
The most obvious way to take some of the load off of the CPU is to not attempt to draw stuff that’s out-of-view. Right now when I’m rendering, I go through the list of 3,000 buildings and cram all of the vertex and polygon data into OpenGL. It then takes each and every vertex and runs it through a bunch of mathematical mumbo-jumbo to figure out the answer to the question: Where – given the current location and orientation of the camera – will this particular building end up on-screen? Nowhere? Oh? This building is behind the camera? Oh well. Chuck it. Let’s see the next one.
This is a very expensive form of math, and it’s all being done by the CPU. I could use vertex shaders to offload most of this work onto the GPU. (A few years ago you used to hear about stuff supporting “Hardware Transform & Lighting”. This is what that term is all about.) I might resort to that if I become desperate, but adding shaders is a lot of new code and complexity. The shaders themselves are pretty simple, and T&L shaders are usually given away as ten-line example programs. But to get access to shaders requires a lot of boilerplate code and doing a little meet & greet with the system hardware to make sure it’s down with Transform & Lighting. I’d really like to avoid doing that if I can get the job done with the tools I’m already using.
With a 90 degree viewing angle, the camera isn’t going to see more than 1/4 of the stuff around it. But the program is doing matrix math on everything before it gets to the point where OpenGL realizes I’ve been wasting its time.
There are a lot of techniques you can use to figure out what’s in view, but I want something simple to code and easy to manage. I have a lot of flexibility in my design and not much in my timetable, so it’s better to go for a rough system that solves 90% of the problem than for a robust and complex one that solves 98% of it.
Observation: The city data is more or less 2-dimensional. I can simply look at the angle to the object from the camera position on the horizontal plane. If it’s more than 45 degrees to the left or right, then it’s out of the 90 degree cone of stuff we can see, and it can be discarded before we go bothering OpenGL with it. But, in order to get an angle…
Let’s look at the code I use to get an angle from A to B on a 2d plane. Let’s see… a bunch of floating-point math. Some more floating point math. And an arctangent operation. Crimey. Finally there are some annoying comparisons and faffing about to make sure the angle stays between 0 and 360. This is not something you want to do to 4,000 objects in the scene. (Actually, that’s how many buildings there are. The total count of objects eligible for a ride through the OpenGL funhouse is over 9,000 once you take into account street lights and cars and some other bits.)
Doing an angle check to an object would possibly be faster than cramming it through OpenGL, although that might not be true in all cases. It might be faster to simply draw a one-polygon car than to do a check on all and then draw the genuinely visible ones. But even in a best-case scenario, it would still be giving the CPU a brutal workout while the GPU naps on the couch. To alleviate this, I divide the world into regions:
|Overhead view. The world is divided up on this grid. The red dot is the camera, and the blue is the cone that will be visible to the viewer.|
Now instead of doing an angle check on all of those thousands of objects, I’ll just do the check on my 256 regions to see which ones are in view. At render time, I go over the visible regions and render their contents, ignoring everything else.
|Objects inside of the green regions will be drawn. Everything else will be ignored.|
This is not without its drawbacks.
1) Buildings sometimes straddle the line between regions, meaning the center of the building might be sitting in a red (hidden) region but a bit of it is poking into a green (visible) region. It would mean that you’d see an empty lot on the edge of the screen, but as you turned to look a building would pop in out of nowhere. This is very noticeable and ugly. Identifying and dealing with these cases can get very tricky.
Solution: I widened the assumed viewing angle to 100 degrees. This means I’ll be rendering (or attempting to render) a lot of extraneous crap just to catch a few difficult objects. If I’m still having speed problems later, this is one thing I can revisit and try to improve.
2) Doing these calculations on a 2D plane is fast, but it assumes you’re looking straight ahead, level. Tilting the view downward will bring nearby red regions into view. As I look down and pan left and right I can see buildings popping in and out of view. That wouldn’t be bad, except that looking down on the city is pretty much the entire point of the project.
Solution: I mark all adjacent regions visible. No big deal. It’s only a few dozen objects. Doing the math to truly sort this stuff out wouldn’t be worth it. I doubt I could even measure the performance improvement from a dozen stragglers.
3) I’m drawing a lot of extra buildings. As soon as the corner of a region enters view, everything in that region gets drawn. I can make the regions smaller, and thus more accurate. This would cull out a lot of stuff that’s still way out of view, at the expense of giving me more angle checks to perform. Right now the world is divided into a grid of 16×16 regions, for a total of 256. If I made the world (say) 32×32 regions, I’d have 1,024 checks to do. I’d have twice the accuracy at the expense of doing four times as many calculations. I could also move to a 8×8 grid and do one-fourth of the calculations, at the expense of having even more overdraw. Hm.
This is a fiendish one, because the sweet spot – the optimal spot – is somewhere in the middle. A 2×2 grid would be worthless and cull no objects. A 128×128 grid would require 16,384 angle checks, which is actually greater than the number of objects in the scene. That’s worse than doing the angle check on each and every building individually. I want the region size to be big enough that every region will contain more than one building, but not so big that I’ll be attempting to render a lot of off-to-the-side stuff. (Note also that this interacts with the texture-sorting I talked about last time. If I’m rendering by region, I can no longer render all of the same-textured objects together. Smaller regions mean more texture switching. Sorting them before each rendering pass would fix this by having the CPU spend a lot more effort in order to save the GPU a little effort, which is the very opposite of what I want to do.)
Solution: Lots of tests later, and it is apparent that my initial choice of a 16×16 grid was already the optimal one. There are reasons why the optimal setting might be different on a different machine. I must not think about this or it will drive me crazy.
So far, I’ve worked the framerate up to 130-ish. I’ve technically met my goal, although framerate is still under 100 when the bloom effect is turned on.
I’m sure I can do better than that. I still have another round of optimizations to make. Ideally, I’d like to get it working so that it’s running at over 100fps even when bloom is on. We’ll see how the next round of changes go.
I’m sure some people will notice that I re-worked the sky to be a single gradient, added some simple radio towers on top of buildings, and added some other bits of sparkle power. None of it is set in stone. These were just experimental visual changes I made over the weekend and not related to any of this optimization work. I’ll have time to fuss with that stuff once I have the rendering working smoothly.