NOTE: This post is a little less thought-out than my usual programming posts. This was written pretty much on the fly as I was experimenting with stuff and not after I’d reflected on it. I’m not even sure it will make sense. Give it a try.
Today we’re going to be talking about Threes!, an iOS game. I haven’t played that version, but I’ve played this web-based clone. For the purposes of this discussion, you should probably go play the game, get addicted for a few days (everyone does) and then come back here once the mania passes. It will be easier to follow the discussion that way.
After doing that, you might want to read this article from Touch Arcade that talks about how someone who wrote an AI to play the game, which revealed some interesting things about the mechanics.
If you don’t have that kind of time, then here’s a basic run-down of the gameplay:
You play using the arrow keys. Tiles will attempt to move in the given direction. If a blue slides into a red (or vice-versa) they merge to form a 3. From there it follows a simple pattern of matching like with like. 3+3=6. 6+6=12. 12+12=24. 24+24=48. And so on. The trick is that every time you move, a new tile is added to the board. If I shift the pieces up, then a new tile slides in on the bottom row. The game ends when the board fills up such that no more moves are possible.
So your apparent objective is to keep merging tiles to make ever-larger numbers. But the actual challenge is to simply merge tiles faster than they appear to keep the board from filling in. If you play a couple of times, you’ll probably get a score of a few hundred.
You normally expect your scores to go up as you play a game. Over time, your skill improves and you’re able to do better. Except, that’s not quite how things went for me. Sure, I repeatedly broke my high score, eventually playing a game all the way to about 7,500It’s been reported that scores in excess of 21,000 are possible.. But mixed in there were still a lot of 150-point games. When that kind of thing happened I always assumed that I had stopped paying attention. But this kept happening, no matter how hard I “tried”. Some games dead-ended early and some went a long way, and my results didn’t seem to line up with how much effort I was putting in.
This makes me think that the game has a huge element of luck. I wanted to play around with this idea, so I decided to make my own version of the game so I could explore the mechanics.
(All of this is written in C++ using old-school OpenGL. It’s overkill for an afternoon project like this, but I’ve already got the boilerplate code handy and using that is faster than learning Python or whatever you kids use for your prototyping work these days.)
First off, this business with the red and blue tiles is kind of suspect. The player needs even numbers of red and blue tiles in order to combine them. So if I get four red tiles in succession they will eat up a quarter of my play area and I’ll have no way to get rid of themAccording to the Touch Arcade article, the RNG compensates for this, but it takes several moves for the needed pieces to show up. Enough time to kill a game.. That can doom a game through no fault of the user. Having four more of one color than the other is rareFor varying definitions of “rare”. at any particular moment. But in the course of a game that lasts 100 or so moves it starts to become likely. Actually, it’s worse than that. It’s “likely” in the sense that it will happen in some games and not others. I suspect that my long-running games are ones where this sort of thing – against the odds – didn’t happen, thus letting me squeak through those tough points in the game where you’ve got a lot of high-value pieces on the board that aren’t quite ready to combine.
If this were done the other way with a series of direct combinations, then this randomness would be mitigated. If 1+1=2 and (humor me here) 2+2=3, then there wouldn’t be any combination of four low-value tiles it could throw at the player that would be mutually inert. Something would be able to combine.
It’s entirely possible the original designer had a good reason for setting things up this way, but I don’t know what it was. Maybe the concern was that it would be too easy to “solve” the game without this randomness. Maybe everyone would end up with about the same score without it. (If this is true, then it means this is a game of luck where you use skill to reach as much of your determined-by-luck potential as possible. That’s not bad or anything. Lots of games work that way.)
I don’t know. But I’m going to build an alternate rule set for my version. In my rules, I’m going to use direct progression using powers of two. I know my powers of two (and more importantly, my square roots) a lot better than I know all these multiples of three, which will make it easier to wrap my head around the game. So 1+1=2, 2+2=4, 4+4=8, 8+8=16, and so on.
|I’ll explain the information on the left a little later on. Let’s just get the basics down first.|
Basing things on powers of two avoids the obviously ridiculous business of having 2+2=3. It also lets us use a cool shorthand for high-value tiles. 1,024 can be 1k and 1,048,576 can be 1M. (kilobyte and megabyte, respectively. It’s educational!)
So now I’m going to build an AI to play the game for me. It’s not very bright. It only looks at the next move and doesn’t attempt to plan several moves in advance. It just attempts to keep the board as clear as possible. Barring that, it will try to move combine-able tiles into place next to each other. For scoring, we don’t don’t actually care about “points”. We’re just interested in how long a game lasts.
So, I’ll have my AI play a round of 32 games. First it will play according to the original rules where the first two tiles must combine to make the third. Then I’ll do another run where the first tile combines with itself to make the second, and the second combines with itself to make the third, etc.
The results? Kind of a surprise:
This is a run of 32 games, as played by the same AI, using the same pseudo-random sequence, under the two rule sets. The red line represents the game according to my rules. The blue one is the original rules. The higher the line, the longer the games. So in the very first game the AI – playing under the original rules – lost the game just before turn 100. Then playing the exact same game under my rules, the AI ended somewhere past 300 turns.
You can see my rules are quite a bit easier. (The games are longer overall.) But what I didn’t expect is that both rule sets are still incredibly random. My rules allow the AI to score anywhere from 150 to 525. The original rules have games that run from 50 to about 225 or so. Which one is “more random”? Original rules have a lower delta between the top and bottom of the range, although the delta is a larger portion of the average. Roughly:
- The best Original-rule games scored five times higher than the worst ones, while the top Shamus-rule games only scored about three times higher than the worst.
- The best Original-rule games were about 175 higher than the worst, and the best Shamus-rule games were ~375 higher.
I think I’d need the help of statistics nerds to explore this further. There are a lot of ways to look at this data. The point is, I don’t know which one of these counts as “less random” in the totally subjective sense of feeling more fair to the player.
Now let’s see what happens when we make the play area larger. We’ll do the same run, comparing Original and Shamus games on a 5×5 grid instead of a 4×4.
|A game I played myself. (No AI.) Here we’re pretty close to the end.|
In case you’re curious about the text: (Some of which is debugging info.)
- Score: The scoring system used by the original game is a little mysterious. For my program, I’m just adding up all the tiles currently in play.
- Moves: The real measure of success, in terms of appraising your strategy.
- Ruleset: Original or Shamus.
- Highest: This is used when figuring out what the next tile will be. In my game, it halves the exponent of the highest piece on the board. So if your highest tile is 256, that’s 28. Halving the exponent gives us 24, which is 16. So the “Next Tile” will give us 1, 2, 4, 8, or 16. Without this, games can take bloody ages before you start running out of room.
- AI Rating: This is how much the AI “likes” this particular board layout. More empty space=better. More combine-able pieces next to each other=better. This is just for my own debugging purposes.
- AI Movement: This number just tells me which direction[s] the AI can move. Again, debugging.
- Filled: What percent of the board is filled. The game begins with mostly empty space, but quickly rises to about 60% full. It then plateaus in the 60-70 range for the course of the game. Once you hit 85%+, you hit a tipping point where the lack of movement options leads to having even less options, and the game usually ends.
- Playtime: This is how long the current game would take if played by a human that made a move every 1.5 seconds or so. This is important later.
So if we make the game area 5×5, the outcomes look like this:
Okay, so let’s try it again on a 7×7 board:
For the record, a game of 11,000 moves or so would take you right around 5 hours, assuming you averaged a second and a half per move. (It takes my AI about 4 seconds to play through the same game.)
Well, it looks like I was wrong. My rule set is somehow more random, not less. I don’t know how. Maybe I’ve got a bug or design flaw in my AI that’s keeping it from performing properly. In the end, this was less illuminating than I’d hoped.
|An AI game in progress. Note that the “playtime” is how long a human would take. The AI had only been playing for about two minutes at this point.|
Still, we did learn a few interesting things:
- As you might expect, making the board larger adds dramatically to the length of the game. A 4×4 takes a few minutes. A 6×6 takes about half an hour. An 8×8 takes about 5 hours. 10×10 is a couple of days. 12×12 is about ten days. (Again: This is assuming non-stop rapid-fire movements.)
- For anything larger than 5×5, I think the game needs a little something else. Some special pieces or a powerup or something.
- On larger boards, a lot of the interesting activity happens in the very last stages of the game. It’s kind of like starting a game of Tetris at level zero. You’ve got half an hour of really boring play. Then three minutes of of challenge, then a minute of sheer chaos where it all falls apart. But unlike Tetris, we can’t just “start” the player near the endgame, because how they fare in the endgame is a measure of how careful and disciplined they have been at managing the board during the “boring” parts. This probably means the ideal board size is 6×6 or less. Anything larger, and it just takes too dang long before you can see the results of your efforts.
- There are a lot of interesting things you can do with the “next tile” logic. You could have it only give you 1’s and 2’s, which would make the game stupidly long and boring. But maybe my approach is too conservative. Maybe instead of 2n/2, it would be more interesting to use 2n-2, or just 2n. The latter would mean that once you get a 256, then it will start randomly giving you 256’s. That would make the difficulty ramp up quickly. It might also make the game more random.
Then again, this post proves I’m probably bad at intuiting how “random” a system is.
I offer this post as an example of why the constant “cloning” of mobile games isn’t necessarily a bad thing. Threes! is a dead-simple game, but here I’ve stumbled on several interesting variants of number-combining that are thus far left totally unexplored. I’m sure there are other variations you could play with. You could have a half dozen of these games on iOS and each one of them would be unique and worthwhile. Or someone could put out a mindless re-skin with identical mechanics. A lot of it depends on who is making the game and why they’re doing it.
It goes back to the “We make games to make money” vs. “We make money to make games” problem. If all you want is money and you don’t care about games, then you’ll look at what’s selling and do a straight-up clone. If you love thinking about and exploring mechanics then you would probably find direct cloning to be tedious and boring. You’ll be driven to make something different – something you want to play that doesn’t already exist – and you’ll put it up for sale as a way of getting paid for your efforts.
In any case, this is a gem of a game. Lots of neat stuff to think about. Do give it a try if you haven’t already.
 It’s been reported that scores in excess of 21,000 are possible.
 According to the Touch Arcade article, the RNG compensates for this, but it takes several moves for the needed pieces to show up. Enough time to kill a game.
 For varying definitions of “rare”.
There's a wonderful way to balance difficulty in RPGs, and designers try to prevent it. For some reason.
Fixing Match 3
For one of the most popular casual games in existence, Match 3 is actually really broken. Until one developer fixed it.
Do It Again, Stupid
One of the highest-rated games of all time has some of the least interesting gameplay.
A horrible, railroading, stupid, contrived, and painfully ill-conceived roleplaying campaign. All in good fun.
Juvenile and Proud
Yes, this game is loud, crude, childish, and stupid. But it it knows what it wants to be and nails it. And that's admirable.