on Aug 9, 2011
Here is part 2 of my commentary of John Carmack’s Quakecon 2011 keynote. As before, the entire presentation is first, followed by my comments, with links to timestamps.
Carmack talks about the PS3 and the way its memory space is divided up. This has always been a hot-button topic when I’ve tackled it, but it’s been a few years since we had that conversation. Perhaps heads have cooled and we can try again…
This is a big part of why I was so frustrated with the PS3 hardware after release. Sony said things about how fast their processors were and about how much memory the thing had, and all of that was technically true while being irritatingly beside the point from an engineering standpoint. There’s a lot of memory there, but it’s Balkanized into these chunks with fixed uses. There are a lot of processors, but they all have fixed uses as well. So you end up with situations like this one, where the PS3 actually has less useful memory than the Xbox 360.
Sony got away with this because the explanation for why their machine wasn’t better was highly technical. Gabe Newell famously criticized the machine. Sony fans, most of whom are not programmers, couldn’t really understand the inherent problems in developing for highly unorthodox hardware, and wrote him off as a PS3 hater. I got the same treatment when I made fun of the thing. (Only making fun of Halo caused more fury and personal attacks.)
Imagine you’re running a warehouse filled with consumer goods, and it’s your job to move stuff around. Let’s say it’s the storage area for a big-box store. You need to receive new goods from the loading area, store it on shelves, and move the goods out to the other side when they’re ready to be put out for sale. In the Xbox warehouse, you have just one forklift, which can handle 2,000 lbs. In the PS3 warehouse, you have a forklift for moving consumer electronics, another forklift for moving kitchen appliances, another for furniture, another for clothing, another for toys, and another for “everything else”. All of these can handle 1,000 lbs. The people who designed the PS3 warehouse will tell you that you have six fork trucks that can move 1,000 lbs., which means you can move stuff three times as fast as those losers in the Xbox warehouse.
But the truth is that it’s really hard to keep all of those trucks moving without them getting in each other’s way. You only need to move toys once a week and furniture every other day, so those fork trucks are rarely used. The restrictions on what each truck can carry imposes restrictions on how you can lay out your goods. For example, popular things like consumer electronics and kitchen appliances can’t be stored on the same aisle or the trucks would block each other, so one of them has to be moved to a less-optimal location. And finally, the trucks themselves take up a lot of floor space, leaving less room for moving around and storing goods.
In the end, the speed gains from having six forklifts is nearly negated by the various limitations. Worse yet, the extra expense is incredible, the job is a lot more complicated, and there are some things you simply can’t move because they’re too heavy for your trucks. The worst part is, everyone keeps telling you that you should be moving things three times as fast. When you try to explain that it’s not that simple, they tell you you suck. “Do you know how expensive this hardware is? If you can’t make it go faster then you’re the one with the problem.”
From a software engineering standpoint, it’s going to take a lot of work to put any of that extra PS3 power to use. That extra work will only benefit your PS3 version, and does nothing for the PC and Xbox 360 versions. If you don’t do that extra work, your game won’t be using the machine to its fullest and all that expensive processing power the user bought will be going to waste.
I still believe, as I asserted a couple of years ago, that the PS3 was engineered in such a way as to choke off the competing platforms by building a developmental wall around Sony to make porting difficult. This would have increased the number of de facto exclusives. However, with them trailing in market share, the wall is working the other way and keeping developers from bothering to port to this quirky, unconventional beast. Ironic justice, I suppose. (Assuming I’m right. Note that this is all conjecture on my part. I think it explains Sony’s behavior better than, “they were just dumb”, but we’ll never know what really happened inside the company unless someone writes a tell-all book.)
Blu-ray drives are slower (latency) than DVD’s? I did not know this. That makes the earlier problems even more pronounced. If you decided to alleviate the congestion problems in the PS3 warehouse by keeping less goods on hand and simply ordering smaller batches of goods more often, now the ordering system is set up so that there’s a longer wait between the time you order more televisions and the point where they show up in receiving. And now I’ve probably stretched this warehouse metaphor too far.
To be fair, I don’t know how big the delta is between DVD and BR read speeds. Note that this is about latency, not throughput. BR can probably deliver more total data a second, but it takes longer between the time you request a block of data and the time when data actually arrives in memory. It’s another case of the PS3 hardware limitations exacerbating each other.
“Four levels of locality” – this is indeed a complicated system. At first the data is sitting on your optical media. (DVD or BR) Then it gets pulled in, which takes a long time as computers reckon things. To spare you that wait the next time the data is needed, this data is saved in a temporary file on the hard drive. (I can’t believe this game will actually run on an Xbox with no hard drive. Amazing.) The data is also held in general memory. And finally, the data is put into graphics memory, where the texture is available for rendering.
When he says “Tex sub image two dee”, he’s talking about glTexSubImage2D (), which is an OpenGL function. This function takes a block of memory and places it into a texture in memory, or a portion of a texture. Going back to my example image:
You’d use this to replace the contents of one of those little squares. I use this heavily in Project Frontier, when generating the terrain textures. I’m sort of terrified to think about what might be going on under the hood in my program. Some of my textures are as large as 2048×2048, and if glTexSubImage2D () copies the entire thing in memory when I update a little 128×128 patch of it, then that’s a really painful performance hit. Best of all, it sounds like it only happens on some systems, or with some drivers. Wonderful.
It says a lot that even after these years of working with DirectX, Carmack still speaks (and perhaps thinks) in terms of OpenGL.
He’s explaining that glTexSubImage2D () has a lot of stuff happening in the background that can make it painfully inefficient in some circumstances. He’s explaining this to illustrate how PC can still struggle, even though it’s ten times as powerful as a console. On the console side, the program has direct access to video memory. On the PC side, you have to navigate everything through the ever-changing landscape of graphics drivers, which might be doing all kinds of extra processing that you don’t need. It relates to the dirt road problem I discussed before.
“Intel’s current graphics hardware is getting decent”. I didn’t know this, and it’s an interesting twist. Basically, those crappy, built-in graphics cards on cheap PC’s and laptops are approaching the power and functionality needed to run games properly. The problem has been that there was never much of an incentive for Intel (or other manufacturers) to make their built-in graphics hardware better. Why spend money on it? Most people won’t care, and the ones that do care are the ones who will probably buy a $200 graphics card anyway. So Intel just puts the cheapest hardware it can in there.
I’m sure they’re still putting the cheapest stuff in, but things have advanced so far that even the cheapest graphics hardware is good enough to nearly keep up with the current-gen consoles. If this trend continues and we don’t suddenly get another console generation, then we’ll see an increase in the number of PC’s that can run games without needing to buy a graphics card. I doubt it will ever be enough to make the PC a true market rival to the consoles, but in the long run we might get a few more ports, and PC ports might be less horrible. In an ideal world, graphics cards will be for people who want to run the game at max settings on a ginormous monitor, and people content with medium settings won’t need one.
“A thousand characters should be enough”. Sigh. C++ takes a lot of flak for this, because this is a really common problem in the language. A programmer needs to reserve some memory to hold something. A person’s name, a directory name, a list of available fonts, or whatever. You have no idea how big this data will be. The sloppy solution is to just imagine what you think will be the biggest number you’ll need, multiply by two, and use that. If I need to store the name of a place, I might think 40 characters is enough, so I’ll reserve 80 bytes “just in case”. Then someone from Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu shows up, the program tries to store the 97-character name in the 80-character slot, and Things Go Wrong. (If you’re lucky, you just crash.)
The forward-thinking (but still lazy) programmer might avert this by reserving TEN times the space he thinks will be needed, but that will eventually lead to a lot of wasted memory, and won’t really solve the problem in a guaranteed-safe way.
As Carmack warned elsewhere in his talk, this is the stuff of holy wars.
“This language is flawed because you can shoot yourself in the foot.”
“ANY good tool can be used wrong. It’s only a problem if you’re a bad programmer.”
Programming languages all have trade-offs. Readability. Learning curve. Performance. Portability. Consistency. Availability and usefulness of third-party libraries. The cost of maintaining code. Breadth of built-in tools. Flexibility. When people argue about which language is best, they’re usually gathered around the fault lines formed by these various trade-offs.
The tear-line problem:
Your monitor updates at a fixed interval. Your videogame does not. If the game isn’t done drawing the next frame of the game, the monitor can simply repeat the last image you were shown, and you’ll have to wait for the next monitor refresh to see the new image. The upshot is that if the game dips below 60fps, you’re effectively going to be seeing 30fps. If the game misses 30fps, it dips to 15fps. If the game is running at (say) 20fps, then some frames take twice as long as others to appear, and the game will shift between 15 and 30 fps. I can feel and see this when it happens, and it’s annoying. (The same effect of bouncing between 30 and 60fps is much harder for me to detect. It’s one of those things that some people can’t notice, but which drives other people crazy.)
The other solution is for the game to show the new frame as soon as it’s done, even if it’s not time for a new frame. You might have noticed this option in games, usually labeled as “disable vertical sync”. It will just slap the new image into place, ready-or-not. This leads to situations where the part of the screen is the new frame and part of the screen is the old one, like so:
|Vertical tearing. (Simulated.)|
I’ll add that 60fps is really, really hard to maintain in a complex game with lots of things going on. You need to have your threads and your scheduling working just perfectly. 30fps is many, many times easier to pull off. I’m anxious to play Rage, just to see how much I can feel the difference. It’s been years since I played a new game that ran at that speed.
We’ll wrap up this series tomorrow.
Shamus Young is an old-school OpenGL programmer, author, and composer. He runs this site and if anything is broken you should probably blame him.