I’m still working on the project. The stuff I’m working on now is a little boring and would mostly be a re-hash of stuff I’ve discussed before. So rather than do that, let’s talk about data structures.
Remember the little red cube. We’ll see him again later.
When a programmer sits down to write some software, the question comes, “How will I represent this problem in code?” This is actually one of my favorite parts of the job. (Or hobby, in this case. Unless you’re hiring? Do you want to fund development of this project? It has no gameplay, no plan, no support, and it doesn’t even work yet. Let me know if you’re interested!) If I need to store a historically important date like August 24, 1971, then there’s a lot of ways I could do it. I could store it as a text string: “August 24, 1971”. But then it would be hard to do math on it. (Say, to calculate how long it’s been between then and now.) I could store it as a group of three integers: 8, 24, and 1971. I could store it as the number of seconds elapsed since January 1, 1970: 51,840,000. The latter is great for doing math (and is actually how most systems store time internally) but it means you have to do a messy conversion when you want to display the date to the end user. Because if you display a date as 51,840,000 then the end user will find you and burn your house down.
What method will be fast? What will take up the least memory? What will make for clear, readable code? These are questions that programmers love to ponder before coming up with the wrong answer and making a mess of things.
Which brings us back to the problem of storing large tables of data.
We look at this and see a table. But internally, computers don’t “do” tables. They’re not really into 2-dimensional kind of stuff. Computer memory is a long, long line of values. If you’ve got four gigabytes of RAM, then you’ve got four billion little memory addresses in a single row, and it’s up to the programmer to make sense out of them.
This is how the table looks to the computer. Red red yellow blue red red blue yellow green blue green green yellow yellow green green. If I want something in the third column, second row, then I have to do a little math to figure out I’m really looking for box #7.
But then some cheeky programmer looks at the data and says, “I can’t afford the luxury of squandering sixteen whole boxes like this. What am I, Donald Trump? This isn’t a supercomputer with endless memory! You know what? I’ll bet there’s a better way to store this.” And then the programmer invents the quad tree.
I’ve already explained how these work way back in part 2, so let’s not go over that again. The point is that I can no longer look things up the way I did before. If I want the third column, second row, then I have to look inside a box, inside a box, inside a box. There is no shortcut to getting there. It’s a tradeoff. We’re trading speed, code clarity, and convenience in exchange for not using up so much dang memory. That’s a lot to give up, and we wouldn’t even contemplate this if not for the fact that 3-dimensional data (like our cube world) gets really, really big, really fast. Width times height times depth is a simple calculation with terrifying implications.
But you don’t want to have to store the entire world in memory at once, not even in a tree. It would be impractical. In the case of an open-world game, the data wouldn’t even fit in memory, not even when using a quad/octal tree. Also, if the world was 2 kilometers wide (not very big) then every single lookup would take 11 hops. You’d need to look at the box-within-a-box-within-a-box, 11 levels deep.
So what we want is a hybrid system. We want the convenient lookups of using a grid mixed with the memory savings of using a tree. We want a grid… of trees.
Ideally, your trees should have a maximum size of n, where n is the largest power of 2 that’s likely to be homogeneous. Look at your giant data set. What’s the largest area of same-squares? If you never see an area larger than 16×16 same-color squares, then there’s no reason to make your trees larger than that.
Which brings me to the structure of project Octant:
So when we want a particular cube, we do a little math to figure out what column it would be in. We look up that column (if it’s available) and ask for the related node. From the node we grab the octree, and from there we drill down to the cell in question. So our worst-case scenario is:
scene » column » node » octree16 » octree8 » octree4 » octree2 » octree1 » cell.
That’s a lot of hops. Things get really fun when one cell needs to look up the cell right next to it, and it takes 9 hops to reach its next-door neighbor.
I was rather worried about this. I mean, each empty cell needs to look up all six of each neighbors to see what faces it needs to draw. (Since a cube has six faces.) Six queries time nine hops sounded like a LOT of wasted time. I added a bit of code to allow “backtracking”. I made octrees aware of their parents so that the 2x2x2 octree would be able to reach up and ask the 4x4x4 octree for a particular cube. If it didn’t have it, it would continue to pass the request up the chain. I figured that since the vast majority of lookups were for cubes that were “next door”, I’d see some big savings. Hopping up one level and down one level ought to be faster than going down through all nine levels.
Turns out I was wrong. The time needed to construct a single node went from 180ms to 170ms. That is a very small gain. I expected some massive jump in performance, and instead I got what? A 6% boost?
Still, this is exactly the sort of thing I wanted to play around with when I started the project. It’s sort of interesting to experiment with things and see how they behave.
I’m not totally sold on the structure I outlined above. It’s not terribly complex (by the standards of game engines) and I’m still fiddling with it, looking for where the performance bottlenecks might be. I might discover that this design is flawed in some way. Or maybe I’ll come to the same conclusion Goodfellow did, and end up storing everything in a pure grid. We’ll see what we find.
What was the problem with the Playstation 3 hardware and why did Sony build it that way?
The Best of 2013
My picks for what was important, awesome, or worth talking about in 2013.
C++ is a wonderful language for making horrible code.
Why Batman Can't Kill
His problem isn't that he's dumb, the problem is that he bends the world he inhabits.
The Witch Watch
My first REAL published book, about a guy who comes back from the dead due to a misunderstanding.