It’s been a running joke for a couple of years that half the games coming out have the word “Dead” in the titleAlso, games with ‘half’ in the title are dead.. Dead Space, Dead Island, Deadlight, Left 4 Dead, etc. So it got me thinking: Just how common is the practice, really? Is the word “Dead” really as played out as it seems, or is this a case of confirmation bias run amok? Aside from “dead”, what are the top overused words in game titles? Are there any overused words that we just don’t notice?
So I’m going to find out. Since I don’t want to run through and manually enter the name of every videogame ever made, I need a way to automate this. The path of least resistance seems to be to use Steam’s library. Being a PC platform, Steam is obviously missing a ton of games. But this should be close enough for our purposes. This isn’t science, it’s trivia.
Sadly, I can’t find a clean way to extract a full list of titles from Steam. The closest I can come is this file, which looks kind of promising at first. But there’s no way of knowing how old the list is, or if all games are listed.
Worse, the list includes a lot of non-game stuff like DLC and trailers. Which means that if there was a game called Dead Shooter, then it might appear several times in our list like so:
Dead Shooter Guns Pack 1
Dead Shooter Release Trailer
Dead Shooter Launch Trailer
Dead Shooter Beta
Dead Shooter Guns Pack 2
Dead Shooter GOTY Edition
Dead Shooter Brady Guide
Ugh. I feel strongly that we should not count “Dead Shooter” EIGHT TIMES. There’s only one game called Dead Shooter so it should only count once.
|This wall of text is boring, so enjoy this image of a woman drawing random database crap on a window. Feel free to critique her schema. It actually looks pretty solid to me, except it looks like you can only order one product at a time. Also, I really hope that “password” is actually the hash and not the raw text.|
In an ideal world I could just query the Steam database and filter out things like trailers, betas, and DLC. But apparently the paranoid people at Valve don’t want to allow open database access to all the random strangers on the internet? (The nerve!) So we have to settle for parsing this text file and trying to untangle it ourselves.
Some DLC has the word DLC in the title. But not all of it. Sometimes it will use the word “Pack”. Sometimes there won’t be any special descriptor at all: “Dead Shooter – More Guns”. There’s no way to know if that’s DLC, a trailer, a sequel, or a special release of the game without going to the Store page and looking at it. And I’m not going to manually inspect all 15 thousand games in this list. Sorry…that you’re demanding something so unreasonable. What’s your problem, anyway? Sheesh!.
- Parse a large text file and strip out all the crap that isn’t the title of something.
- Remove everything than can be identified as DLC, beta, trailers, etc.
- Do a word frequency count on the remaining titles.
- A few words shouldn’t be tracked. Sequel numbers aren’t very interesting for this project. Neither are words like “the” and “of”.
- Find the top N most overused words and list them, along with their games.
We need to find the
best most convenient tool for this particular job.
|I actually own tools like this, except my set also includes a special screwdriver that is always magically the kind I don’t need at the moment.|
So we need to use a language or a script that’s good at doing complex conditional text parsing. I know some people will say Perl script is a good tool for this job, but I can’t tell the difference between valid Perl and something I typed with my faceIs there a difference?. This could also be a good job for Python, but I don’t know Python at the moment. I always find myself in one of two states:
- This is probably a good job for Python, but I don’t want to stop working on this project to learn a new language. I’ll learn it later when I’m not so busy.
- Man, I really ought to learn Python. But I don’t need it for anything right now. I’ll wait until I have a project I can use it on.
So I guess I’ll use everyone’s favorite language, PHP. And to be fair, I think PHP is a good fit here. When you don’t care about stability, security, or performance, and where code maintenance isn’t a concern, then PHP can often be a decent choice.
So I write some PHP to chew through the text and give us the goods.
The result? A complete mess. As usual, it’s Activision’s fault.
The Call of Duty franchise has a ridiculous amount of DLC and trailers, none of which have “DLC” or “Trailer” in the title. So we get page after page of stuff like, “Call of Duty, Call of Duty Singleplayer, Call of Duty 2, Call of Duty 2 Singleplayer, Call of Duty: United Offensive, Call of Duty: United Offensive Singleplayer.” This shoots both “Call” and “Duty” to the top of the list. There are “only” about 12 CoD titles on the PC, but my list is showing nearly one hundred. And there’s no good way to filter this except to remove everything with “Call of Duty” in the title. And that’s fine. Without this one franchise the words “Call” and “Duty” aren’t common at all and shouldn’t appear on our list.
Other notable offenders are Company of Heroes and Total War, which pollute the list with a ridiculous flood of DLC. The last major offender is the Train Simulator games, which have a million little DLC packs that all have the word “class” in the title.
I filter all that crap out. We’re looking for overused words in game titles, not over-monetized games.
There’s one last round of filtering we need to do. We need to remove a lot of descriptive words: Gold, Steam, Online, Episode, and Game. Also sequel numbers and years. Those words are really common, but while “Dead Shooter” seems like it’s overusing the “dead” word, I don’t think anyone minds when an MMO is called “X Online” or when a re-release is called “Gold Edition”. And “Shoot Guy 2014” is arguably just as good a title as “Shoot Guy VII” and “Shoot Guy 7”. All we care about here is the “Shoot Guy” part, not the sequel identifier. These words are descriptive and helpful to the consumer and I don’t think it’s fair to count them as overused. They make game titles less confusing, while putting the word “Dead” in everything makes them more confusing.
So after filtering out as much noise as I can, here are the to 20 most overused words in game titles:
1. World – 129 titles
2. Dark – 120 titles
3. Star – 107 titles
4. Space – 98 titles
5. Quest – 89 titles
6. Battle – 89 titles
7. Dead – 79 titles
8. Magic – 78 titles
9. Black – 78 titles
10. Ghost – 76 titles
11. Wars – 72 titles
12. Simulator – 66 titles
13. City – 63 titles
14. Kings – 62 titles
15. Dungeon – 61 titles
16. Rise – 61 titles
17. Dragon – 57 titles
18. Deluxe – 56 titles
19. Maker – 54 titles
20. Evil – 54 titles
So that’s something of a surprise to me. I hadn’t ever noticed the overuse of “World” or “Space”. And “dead” – which I expected would be one of the big offenders on this list – is fighting for seventh place.
The list isn’t perfect. I think Crusader Kings DLC is propping up #14, and I noticed #20 counts all games with both “Evil” and “Devil”. “Ghosts” is propped up by Call of Duty: Ghosts and its endless flood of trailers and DLC. I could probably find other flaws in the list if I went digging for them, but this basically satisfied my curiosity. If you want to see it in more detail, here is the top 20 list, including the games.
 Also, games with ‘half’ in the title are dead.
 …that you’re demanding something so unreasonable. What’s your problem, anyway? Sheesh!
 Is there a difference?
PC Hardware is Toast
This is why shopping for graphics cards is so stupid and miserable.
The Plot-Driven Door
You know how videogames sometimes do that thing where it's preposterously hard to go through a simple door? This one is really bad.
There are two major schools of thought about how you should write software. Here's what they are and why people argue about it.
Starcraft 2: Rush Analysis
I write a program to simulate different strategies in Starcraft 2, to see how they compare.
The Strange Evolution of OpenGL
Sometimes software is engineered. Sometimes it grows organically. And sometimes it's thrown together seemingly at random over two decades.