|Programming||By Shamus||Aug 23, 2013||96 comments|
I worked at Activeworlds for a lot of years. Activeworlds is a social / gaming world along the lines of Second Life or Roblox. It’s a virtual world with user-made content. The experience gave me some interesting phobias regarding CPU cycles.
Perhaps some anecdotes would help. For the sake of argument, let’s say these are all taking place around 2003 or so.
Joe User is building himself a virtual office in Activeworlds. He wants dark windows with a heavy tint, but the object library only has these windows with 50% transparency. Joe doesn’t understand transparency. He’s not a graphics artist or a programmer. He’s just a regular person, and to him tinted windows are tinted because they’re “dark”. He tries changing the color of the window from the default blue to black, but confusingly it doesn’t help. He can still see through the window just fine.
He makes a copy of the window to try another color when he has a eureka moment. Looking through both windows – one in front of the other – really makes a huge difference. What’s really happening is that the first window is blocking 50% of the outside color, and the second window is blocking 50% of the remainder for a final opacity of 75%. Joe doesn’t know this. All he knows is that this looks nicer. He makes another copy, and it’s better still! This is clearly the key to success. Two more windows perfect the look, giving him an overall opacity of 97% or so. He’s got five windows stacked up here. He nudges them so they’re only a centimeter apart.
If you’re a professional, your eye is probably twitching by this point. This isn’t art, it’s sabotage. Alpha surfaces (like our partly transparent windows) must be sorted before rendering. It’s a constant struggle to limit the number of transparent surfaces you’ve got in the scene and you want to be very careful about situations where the user will end up looking through multiple alpha polygons at the same time. The program has to get the distance to each surface, then shuffle them around and put them in order from furthest to closest before they can be drawn. All of this work must be done by your CPU. (Maybe new graphics cards have some trick for this, but in 2003 this load went right to the CPU.)
But that’s not the bad part.
The bad part is when Joe grabs those five stacked windows and begins duplicating the group over and over to construct an office building out of them.
At a lot of development studios, if a member of the art team did this you could go to their cubicle and set them on fire without breaking company policy.
In another zone, Jane User is building herself a grand welcome area to her sea park. She couldn’t find any sea-themed stuff in the default library so she went online and searched for “3d models”. She found this awesome model of a dolphin balancing a ball on its nose. It was built in Maya or 3DS Max. She finds a converter and proceeds to take this thirty-thousand polygon dolphin and put it in her world as a statue, right near the welcome area. In fact, she builds two of them, facing each other, to form a little archway that new visitors will walk through when they arrive. She overlaps the models so that the beach balls are both in the same spot, making it look like the two dolphins are holding up the same ball. She stands back at a distance and admires her handiwork. Not bad, not bad at all.
She notices that this Activeworlds program is kinda slow all of a sudden. Oh well. Maybe they’ll get around to fixing this software one of these days.
That’s not the fun part.
The fun part is twenty minutes later when Bob Visitor arrives on his wobbly old 2001 laptop. He follows the path to the archway and walks through. As he passes under the beach ball(s), he’s actually occupying the bounding boxes of both dolphin statues at the same time. His computer is now performing collision checking on sixty thousand polygons and has basically stopped doing anything else. After a few seconds he figures Activeworlds has crashed. He closes the program.
Stupid buggy Activeworlds.
Then Jeff User comes along. He’s a kid. One of his friends tells him about this command you can put on objects that will cause them to display pictures from anywhere on the web. He’s figuring out this scripting language and he realizes you can set it up to show a whole bunch of images! He writes this long command to download all these gigantic (1280×1024, big for the time period) desktop wallpapers and animate them on a wall. It’s slow as hell for some reason, but it looks sweet!
Sadly for the people walking by outside, it slows them down too, even though they can’t see inside Jeff’s house.
Now, there were guidelines and rules and tutorials in place to encourage people not to do these destructive things, but you can probably guess how likely it is that young people will want to read technical documents before playing “a videogame”. This was exacerbated by the problem that people usually continued building and didn’t think about framerate until their computer began to slow down. And some people have a lot of tolerance when it comes to low framerates. And the icing on the cake is that everyone has vastly different computers. So one person with extreme tolerance for low framerate will use his brand-new lightning-fast computer and build until his framerate is in the single digits. Then other people show up with normal computers and it’s all madness and tears and crashing.
As one of the programmers trying to keep this system going, this setup made me paranoid. In a videogame, if the art team uses a texture eight times larger than allowed, or if they blow through their polygon budget by a factor of ten, it’s no big deal to the coders. Maybe the program crashes, maybe it runs slow. Either way, it’s not your problem. If you’re feeling super-nice you’ll put in a warning message to let the artist know they screwed up. You can’t do this when you’re dealing with user content. Especially when you’re a small company and you can’t afford to drive people away for being bad at 3D graphics.
I had to operate under the assumption that any system could begin devouring system resources at any time. Maybe too much collision detection. Or the animation system might begin shuffling massive texture billboards around in memory. Or the alpha sorter might go nuts. Or the raw polygon-pushing stuff. Or the character animation system. Anything. Anything can run out of CPU or memory at any time.
The result is that you have to build your application like a tank. It needs to be able to absorb and mitigate any level of art asset insanity.
The ramifications of this are kind of crazy. A regular programmer will nod sagely and suggest you deal with CPU usage spikes by putting the problem code in its own thread. But that doesn’t help when any code can become “problem code”, and running every single system in its own thread can introduce a whole new batch of problems.
It wasn’t until recently that I realized this paranoia was unusual. Even now, it’s really hard to make a system without asking myself, “But what if this takes a really long time for some insane reason?”
Let’s say you’ve got a physics simulation running at a nice smooth 60 frames a second. Say we toss a ball and watch it drop:
In an ideal world, if the computer is slow then we’ll get less frames, but we’ll just be seeing fewer timeslices of the exact same scene. If we go down to 10 frames a second, then we’ll end up missing 5 out of every 6 frames, but the ball will still follow the same trajectory:
But in reality, we end up with a badly mangled simulation that no longer works properly.
We get a different parabola because this isn’t linear movement. If the program says, “I’ll do six frames worth of accelerating right now, and then I’ll apply six frames worth of movement to my position.” then we’ll end up with the red curve above. Calculating 1% interest a day isn’t the same as 30% interest every 30 days.
And of course, what if there’s a wall in the way? The green line will bounce off it like a good physics simulation should, while the red line will skip over it. One frame it’s on the left side of the wall, and the next frame it’s on the other side of the wall, and we never got a frame where the ball was intersecting the wall and was available to bounce off of it.
Now the obvious solution is to just go back and do the per-frame calculations. If it comes time to update the ball and we’re five frames overdue, don’t just do one giant move times six. Do all six frames, one at a time, before going any further.
That’s a pretty good solution. Unless the physics simulation is what is causing the slowdowns in the first place.
That seems like a funny idea when we’re talking about a single ball arcing through the air, but if we get a few thousand of them (say, debris from that giant robot you just blew up) and they all need to collide with the scenery (say, the building right behind the robot) then your ‘lil CPU might have some serious math homework to do. If you’re five frames late because you were running physics simulations last frame, then attempting to run an extra six frames of simulation THIS frame is not a solution, it’s the start of a death-spiral.
I’m reasonably sure I’ve seen this death-spiral in commercial games. In Half-Life 2: Episode 2 there’s a scene at the start of the game where a bridge collapses. My computer was on the low end of the spectrum when the game came out, and this scene spiked exactly like our hypothetical runaway physics simulation. The frame rate cascaded downward, with each frame taking longer than the one before it util the game came grinding to a halt and the sound began stuttering. This ended in a long (ten second?) pause before the game snapped out of it again and returned to normal. I’m guessing the physics was trying to catch up, and each frame it was frantically trying to make up for the deficit, which only pushed it further in the hole. The only reason it didn’t crash the game was that the bridge collapse was scripted to take a fixed amount of time. If this was an ongoing thing there would have been no coming back.
You can put in some kind of safety-valve trigger that will skip the physics if they start killing you, although detecting and isolating performance problems on the fly from within the thing you’re trying to measure can be problematic. And even if you guess right, skipping the physics means that you’ve got different parts of the game running at different framerates and at different levels of fidelity, which can lead to strange problems in other areas.
But what if we just decided we didn’t care? What if instead of trying to detect, correct, and mitigate performance problems, we just built a program with no such safeguards? I’m going to find out. To wit:
This game is going to run at sixty frames a second. Period.
Instead of treating 20fps as “not the best”, I’ll treat it like a failure state. If the game doesn’t have the power it needs, it won’t skip non-critical things to speed up. It won’t run parts of the system at a lower framerate. It won’t try to catch up. If the computer slows down, the game slows down. (And to be clear, the game won’t go faster if you have a fast computer. It’s capped at 60fps.) If for some reason the game is played on a machine that can only keep up 30 frames a second, it will feel like some sort of half-speed bullet time.
This saves… well, it saves a ton of work. An unbelievable mountain of uncertain guesswork and difficult-to-test safeguards can just be eliminated. Since we’re not doing heavy-duty 3D, we can just assume the speed will be there.
I don’t know. I just clocked it, and at this point in the project I’m idle for about 730 milliseconds out of every second. So, I’m idle about three-quarters of the time. This is using a debug build with no optimizations.
I don’t know if this is a good idea, but I’m going to try it. I can always add more sophisticated time management later if I need to. To a certain extent, this is like a man announcing that when he takes the dog for a walk, he’s no longer going to wear the crash helmet, safety harness, steel-toed boots, and bulletproof vest. If nothing else, this will be a new experience for me.
And yes I did just write a 2,000 word entry on a feature I’m NOT putting in.
You can’t stop me! I’ve got the source code!