Video Game Chart Party

By Shamus Posted Monday Jan 27, 2020

Filed under: Video Games 131 comments

I have a bit of a sore throat so I wasn’t up for doing a Diecast. But as a way of giving your your usual dose of content, I thought I’d share a strange meeting I had the other day on the internet…

EXT – The Internet – Night

Shamus is minding his own business, scrolling Reddit and generally wasting time, when Don Data enters. He’s an exaggerated stereotype of Jersey-area Italian-Americans. Maybe he’s played by the late James Gandolfini, but maybe not. It’s hard to tell in this lighting. He looks around, and once he sees the coast is clear he invades Shamus’ personal space.

Don Data:

Hey. You’re that internet guy, right? You do programming and stuff. You know about databases, right?

Shamus:

(Visibly uncomfortable.) Yeah. I mean, sort of. I’m kinda… I more of a graphics…

Don Data:

Yeah, yeah. Close enough. Whatever. Check this out… (Don pulls a DATABASE out of his coat and hands it over.) I figured you might want to have a look at this.

Shamus:

(Eyes the database suspiciously.) I don’t get it. It’s just a list of video games and… what is all this?

Don Data:

(Shrugs.) I heard it was, like, a list of all the Metacritic scores for the last 20 years.

Shamus:

(Raises his eyebrows.) Where did you get this?

Don Data:

(Shrugs again.) I know a guy.

Shamus:

Okay, but where did THAT guy get it?

Don Data: 

It fell off a truck.

Shamus:

A database fell off a truck?

Don Data:

(Chucks him on the shoulder and nods.) Have fun, kid.

Shamus:

Kid? I’m nearly 50… (Shamus looks up from the database to see that Don has vanished.)

So that was a little weird. Anyway, let’s open up this database and have a look.

The first thing to note about this data is that it’s only PC games, and it seems to only include games that scored over 30-ish. That works out to a little over 4,000 games. The data set is pretty small in the… hang on. I’ve got the data, so why don’t I just show you?

So the data is obviously very thin before 2000. This makes sense. Metacritic was launched in 2001. Do try to keep this in mind as we go forward. This sample size isn’t that big to begin with, and it’s microscopic in the 90s.

We always hear people say that reviewers are less critical today than they were 20 years ago. I sort of instinctively lean towards this assumption, but let’s see what the data says. Here is an average of all scores given by critics, by year:

Note that this graph is re-scaled to show the 60-100 range. Still, that’s interesting. Obviously we need to ignore the outliers in the 90s. If I’d had more time to redo these charts I probably would have made a version of the spreadsheet without the pre-2001 data. In any case, we can clearly see a sawtooth pattern there in the middle. That low spot in 2006 is the launch of the PS3 / Xbox 360 generation. We tend to get really horrendous PC ports in the early days of a new console. Plus, this was the dawn of phone-home DRM.

I see things improved until 2011, when it took a dive and we had a few bad years. I’d love to blame that on GFWL, but I honestly have no idea what that is.

So that’s what the critics had to say about PC gaming. Let’s see what the Metacritic user scores look like:

Again, note that the scale starts at 60 here. You can see that users rated games much lower than critics pretty much across the board. That makes sense. Critic reviews are gathered automatically, but user reviews are self-reported, and people generally only bother when they feel strongly about something. For users, the mid aughts low point was in 2005, not 2006.

So let’s see how big the delta is between the critics and users…

This chart is basically the green chart minus the red one. Or, how many points higher are critic scores than user scores? We could see this as an indication of the diverging opinions between the people and the press, but this spike at the end could also be the result of games that didn’t turn on their aggressive monetization systems until after the review scores were set. This pissed people off and led to review bombing. So maybe it’s not a measure of the difference between critics and users, but the difference in quality between launch day, and the day the cash shop opens.

As always, take everything here with a grain of salt. I haven’t spent a lot of time with the data yet and I might be missing something. Also, this was my first time creating a pivot table in Google Sheets and it’s possible I bungled it.

This project needed a little more time, but right now I really need to take a break and drink some tea. And maybe play some games from before 2016.

 


From The Archives:
 

131 thoughts on “Video Game Chart Party

  1. The Wind King says:

    Honestly, as someone with his finger barely on the pulse of games media and reviews, and just from experience…

    That divergance between Critic vs Customer looks to be scarily accurate.

    These last few years have been the year of the loot-box and the Microtransaction, which is something Critics do not need to bother with, they play the game professionally for maybe 10 – 40 hours maximum depending on the game length, more if it is some form of long-haul RPG, and unless the game is so stellar they come back to it day after day to relax, they don’t need to deal with any MTX scheme. Long-term players do need to engage with these predatory mechanics, especially if they’re completionists, or it’s a Pay-to-Win scheme in a competative game.

    The divergence isn’t just “Score at Launch vs Score at MYX Store”, but also “Short Term Experience vs Long-Term Experience”, at least to me anyways.

    1. John says:

      That divergence between Critic vs Customer looks to be scarily accurate.

      I want to push back against this idea a little bit. Don’t let confirmation bias lead you astray.

      There is absolutely nothing accurate about the average user review score on Metacritic. It’s not the average score for all the people who played the game. It’s not even the average score for a random sample of all the people who played the game. Instead, it’s the average score reported by people (a) who have Metacritic accounts, (b) who were, for whatever reason, sufficiently motivated to review the game, (c) who may have chosen their score strategically rather than according to the game’s merits–think angry review bombers or hyper-enthusiastic kickstarter backers–and (d) who may not have played the game to completion or possibly even at all. There’s no reason to believe that the average Metacritic user review score and the hypothetical average all-user score are statistically similar. Whether Metacritic average scores are too high or too low is impossible to say. It likely varies from game to game. We have no way of knowing.

      The point is that Metacritic user review scores should be taken with a hefty grain of salt. Or three. Or five. Or possibly a whole shaker’s worth.

      1. Wide And Nerdy says:

        People motivated to leave a review score on a game are generally going to be people who had some direct experience with the game, either the game crashing on them right out of the gate and them having to return it or them playing it for some length of time and developing a valid impression. Its the rare game that people are motivated to review bomb that they haven’t played. Those are notable outliers and this data is an aggregate.

        What I’m trying to say is, there isn’t rampant across the board organized review bombing across all games such that it would affect a large dataset like this overly. It would affect a handful of scores out of the hundreds in any given year. In general any mass downvoting of a game is going to be earned by some bad feature of the game such as microtransactions or excessive buginess. Not based on some external factor like a peripheral controversy.

        That said, I don’t think monetization is the only cause behind the spike in the divide between critic and consumer opinion.

        1. Echo Tango says:

          Even assuming 0/10 from people who haven’t played the game is rare, that still leaves it open for people to give 0/10 without actually thinking about the real score a game would get, if they put some effort into it. Reviews like “the game was boring during the tutorial and crashed once, 0/10” are very real, and the reviews where people actually use other numbers on the scale are more rare.

          1. Agammamon says:

            It also leaves open that reviewers might be affected by ad revenue and access to future titles to review.

            There’s undue influence on both sides. But review bombing is easy to detect and adjust for. And some review bombing is legitimate. Like when the cash shop opens up and you find out they’re selling bullets.

        2. John says:

          I’m sure that’s largely true, but even if we ignore review bombing, there are still selection effects at work. Metacritic user-reviewers are a self-selected bunch. They aren’t drawn randomly from the population of players; their average review score is therefore unlikely to match the hypothetical average review score for the entire population of players. In other words, the average Metacritic user review score doesn’t necessarily tell you what all players really think about the game. It might–the presence of selection effects doesn’t automatically imply that the selection effects are large–but we have no data to test that hypothesis. It might well be useful to read user reviews on Metacritic (in, funnily enough, the same way that it is often useful to read critics’ reviews) but the average Metacritic user score is an unavoidably suspect metric.

          1. Ninety-Three says:

            This is essentially the argument for trusting critic scores over user scores: a game getting reviewed by the critics has the very obvious selection bias that critics aren’t normal gamers, but at least it’s a constant bias. Shoot Guy 4, Gun Man 7 and Boom (2016) are all going to have roughly the same set of biases applied in their critic reviews such that you can calibrate relative to the critics: “I thought Shoot Guy 3 was mediocre even though the critics said it was great, should I buy Shoot Guy 4? Well it got the exact same score as Shoot Guy 3 so probably not.” User reviews on the other hand are all over the place. Gun Man 7 got review bombed because people were unhappy that the map pack DLC was too expensive, and Shoot Guy 4 got virtually no marketing so the only users reviewing it were the die-hard series fans. Worse than just having strong selection biases, the user scores can have different selection biases, making it impossible to compare between them.

            1. Wide And Nerdy says:

              I don’t know. Metacritic says Fallout 3 got a 91 from Critics and a 7.8 from users. Fallout New Vegas got an 8.4 from Critics and an 8.7 from users. So users think New Vegas is better and Critics think Fallout 3 is better.

              I know who I’m going to trust.

              1. Ninety-Three says:

                Are we really going to play the anecdote game? In a database of 4,000 games, I can easily counter with an X vs Y comparison where the user consensus comes out looking bad, but it’d be silly to use one cherry-picked data point to argue about broad trends.

                1. Wide And Nerdy says:

                  I just know probably 9 times out of 10, maybe more, when the critics and the audience disagree, I agree with the audience.

              2. shoeboxjeddy says:

                Users go back years after the fact and add new reviews for various reasons. For example, some users went back and review bombed any Obsidian game they could find on Steam to get mad about The Outer Worlds being an Epic Games exclusive. Likewise, some people review bombed EVERY Bethesda game to specifically express their rage about Fallout ’76. Saying “I like this number better, so this group is more trustworthy” is REALLY REALLY silly.

                1. Agammamon says:

                  Reviews of games released years ago are pretty much irrelevant, IMO.

                  Is M.U.L.E. a good game? Yes it was. Would I recommend it to a 12 year old today? No. It wouldn’t appeal to as large a base of players nowadays as it did when I was that age.

                  But I don’t think it matters. What where looking at here is not a choice between trusting the reviewers and trusting the players, what we’re looking at is a divergence between them – for whatever reason.

                  Maybe one is going nuts while the other is holding course, maybe they’re both going a little nuts. We can’t know from what we’re seeing here.

          2. Wide And Nerdy says:

            I’m just going to reply to both you and Echo Tanga here. The phenomena each of you are describing are as true in 2006 as they are in 2016. Users self selected in 2006, Users gave zeroes for “game crashed once” in 2006. And they did so every year of this data, so we can still compare the years and see trends in the data.

            Looking at trends in the data should still tell us that there’s something going on between consumers and critics.

            If this was just random user abuse you’d expect to see more randomness but the last seven data points show a consistent upward trend. And even if you normalize the line to the 2011 point, the overall trend is upward well beyond all the other datapoints on the graph. User opinion and critic opinion are diverging.

            And I feel like we’re not just seeing it in games. There’s a lot of stuff on Rotten Tomato where the user score is wildly different from the critic score. I was going to give examples but they’re kind of polarizing so best not.

            1. John says:

              We aren’t seeing “something going on between consumers and critics” because we aren’t seeing consumers. We’re seeing people with Metacritic accounts, which is not the same thing. Those people are, at best, a subset of consumers and probably not a representative subset at that. If you’re interested in the difference between Metacritic users and reviewers that’s fine, but in the absence of data it’s statistically unsupportable to conflate Metacritic users with consumers more generally.

              1. Wide And Nerdy says:

                I am interested in that difference and think its meaningful whether or not you accept a connection between that and consumers. I know you think Metacritic users are this weird group of mutants but I happen to think they’re as normal as the rest of the consumer base and that they probably reflect the consumer much more closely than, say, critics do.

                1. John says:

                  I never called anyone a mutant. I never called anyone abnormal. Please do me the courtesy of assuming that my desire for good statistical practice is not driven by hostility or malice.

                  1. Hector says:

                    I am not certain that has any practical value. While you can point out some reason to not trust metacritic user reviews, we have no superior source and just as many reasons to use the data we have. So your statement, while it may be true or valid, is not really useful in this context. In particular, we probably do want to deliberately weigh users with special motives in some circumstances, and based on some observations it appears that Metacritic does so, though they have not opted to clarify how.

                    1. CountAccountant says:

                      I understand what you are saying, but I think it’s only fair to keep the validity of the metacritic scores on the table for this particular discussion.

                      The overall question of the discussion is “Why did critics scores and consumers scores diverge in recent years?”

                      –One family of explanations assumes that the actual opinion of the typical consumer changed (relative to the opinions of critics)
                      –The other family of explanations assumes that the opinion of the typical consumer did not change, and that the numbers are the result of a change in the likelihood of certain people to review a game (for whatever reason)

                      We don’t have enough data to know which is correct. That’s what makes the discussion fun! But if we insist on taking the metacritic scores at face value, we essentially shut out the second family of explanations and limit the people who hold those views from participating. I think we can have a more robust discussion if we keep the trustworthiness of the metacritic scores on the table for analysis and debate.

      2. evilmrhenry says:

        Yeah, the rise of review bombing as a technique should not be underestimated.

        1. Wide And Nerdy says:

          The rise of website censorship in the name of stopping review bombing can’t be ignored either. Lots of legit reviews get taken down in the name of stopping “review bombing.”

          1. shoeboxjeddy says:

            So your argument is that user reviews are even more untrustworthy in the wake of that, correct? Since some of the legitimate reviews were removed and can’t be read? Because you seem to be arguing for how useful the user metrics should be, but then giving points to the contrary…

            1. Wide And Nerdy says:

              No they’re still trustworthy because we have watchdogs keeping an eye on that stuff.

              1. shoeboxjeddy says:

                Explain how this works, then. First, let’s make a distinction between a quality review and a 0 quality review (from here on, a 0-review). A quality review will say “here’s what I liked, here’s what I didn’t, etc etc.” whereas a 0-review will say “I HATE this game 0/10” or “I LOVE this game 10/10” with no other discussion or information. Let’s say hypothetically, 20 reviews that were quality are removed along with 100 0-reviews. Who are the “watchdogs” keeping an eye on stuff? What group is that? Did they read all 120 of those reviews and keep records before any of them were removed? Can the records be accessed by anyone? Those records could possibly be useful for a source of quality reviews, but is that a known resource to anyone?

                Like… what are you talking about here, basically?

                1. Wide And Nerdy says:

                  Those “I hate this game” “I love this game” scores are valid. They’re should not be required to write a full review in order to justify their score, they’re users, not critics. Maybe they really feel the game is a Zero out of ten.

                  I know a lot of people felt that way about Fallout 76. It was terribly buggy (by some accounts Bethesda’s buggiest game ever), increasingly monetized, some had issues with the PVP, many had issues with the lack of NPCs, and there were other issues. Bethesda broke a lot of promises and delivered a game that was fundamentally not fallout, that warrants a zero out of ten and not every gamer should have to write out a full length explanation.

                  1. shoeboxjeddy says:

                    Okay, how about they’re valid scores (your opinion is whatever you say it is) but worthless reviews? You do HAVE to write out a full length explanation if you choose to REVIEW the game. That’s what a review is. When a user decides to access the user review system, they’re putting on their critic hat. If you don’t want to put that much thought into it, that’s fine. You don’t want to write a review, so… don’t do that.

                    1. Wide And Nerdy says:

                      If you concede they’re valid scores then whats the argument? User opinion differs from Critical reviews, That’s what the chart says. That’s all I think anybody ever thought. But that’s a valid consideration. How the game made the user feel doesn’t line up with how the critics are reviewing the game and the divergence has been growing the last several years. There has to be a reason for it and no its not review bombing, not on this scale.

                    2. shoeboxjeddy says:

                      *Citation needed* You’re both arguing that review bombs are all totally valid reviews (which is ridiculous) AND that review bombs haven’t really changed the averages to a noticeable degree (which is statistically false. It is a straight up FACT you are incorrect).

            2. Agammamon says:

              I think his argument is that there’s roughly equal distortion between ‘distortions’ and ‘distortions to remove distortions’ in opposite directions so they cancel each other out.

              Even without that though – we have no idea what’s happening with *reviewers*. What distortions are going on there and what’s being done to cancel those influences out?

              Right now, we have no hard evidence to lead us to choose one over the other as ‘more reliable’.

          2. Ninety-Three says:

            Which sites are taking down lots of legit reviews in the name of stopping “review bombing”?

            1. Wide And Nerdy says:

              Rotten Tomatoes for one. I’ve experienced it personally.

            2. Agammamon says:

              Rotten tomatoes has been caught dropping out negative user reviews – in one of the recent Dr Who episode reviews.

              They’ve also pulled user rating aggregates for several high-profile movies that viewers panned – Capt Marvel for one. Now, yes, you can make an argument that that was a legitimate pull, but it was still a manipulation. They do it, they’ve been seen to do it.

              1. shoeboxjeddy says:

                You’re calling that a “manipulation” but the removal of junk data is an attempt to CORRECT for manipulation. Users are NOTORIOUS for manipulating the number for years now, and then they have the hilarious nerve to get mad when the group responsible for posting the rating tries to remove bad actors?

                1. Wide And Nerdy says:

                  Where’s your evidence of this conspiracy to manipulate user review scores?

                  1. shoeboxjeddy says:

                    I can’t tell if you’re being serious or not. Getting mad at a game for something (let’s say loot boxes) and then rating it zero is a clear attempt to bottom out the review score average. If you asked the person, they would say that’s what they’re doing. So it’s not so much a conspiracy as “something they are openly admitting that they are trying to do.”

                    1. Wide And Nerdy says:

                      But a game that implements loot boxes objectively deserves a zero, in all cases.

                      Maybe some people are having fun in games like that but kids can have fun making pies out of mud, we don’t call the mud a well designed game.

                    2. shoeboxjeddy says:

                      “But a game that implements loot boxes objectively deserves a zero, in all cases.”
                      I’m sorry, I had been taking you seriously all this time. That was clearly my error, now that you’ve described your personal belief as an objective fact.

                2. Agammamon says:

                  A manipulation is a manipulation. That someone is modifying the data must be taken into account – if only to remind ourselves that it needs to be checked to make sure that its only a cancellation and doesn’t go further.

    2. Agammamon says:

      You’re seeing it in other media too. Movies, tv, and streaming.

      A lot of media launched over the last few years with a lot of hype and great critical reviews getting mediocre to bad scores from users. STD, Dr Who (the current one, the score for the overall series is high for both critics and viewers), Batwoman, etc.

      There’s just in general a divergence between what those paid to review will say about something and what people paying will say.

  2. Lino says:

    Hope you get well soon! The article was very interesting!

    In terms of the data, I think the difference between user and critic scores is quite telling, and it would have been interesting to see what genres you have in the sample and whether some genres are more susceptible to this difference of opinion or not. It would also be cool to see which genres dominated over the years – not only in terms of number of games that came out, but also in terms of which genres scored the highest (and if the data doesn’t have genre, maybe you could use some cross-referencing with Steam DB in order to fill the data out).

    Is this data set public by any chance, or is Don Data picky about his clientele?

  3. GoStu says:

    Interesting data, particularly the last table.

    The outliers pre-2001 make sense to me. The only things that’d be worth going back to include in Metacritic from before its official launch are probably memorable, standout titles. Games that were forgettable, mediocre, or otherwise uninteresting probably aren’t worth the bother to enter into Metacritic.

    That differential at the end between “Critic Score” and “User Score” is telling, I think. Disregarding the critic’s high point at 2010, critic reviews have trended upwards across the last decade. User reviews kept going up for 2011 – 2014 and have slumped ever since. The delta between critic score and user score widens every year after 2013.

    Gut feeling is that there’s a lot of critical perfect 10/10 reviews bloating scores – and critics *very rarely* get to dip into the sub-6/10 range for reviews. Gods help the poor critic who feels that the game currently sponsoring his or her employer isn’t a good one either, I’ve never forgotten the debacle after Kane & Lynch’s review score got a reviewer the boot.

    1. John says:

      Everyone thinks that critic review scores are too high, but heaven help the critic who gives a lower-than-desired score to some player’s favorite game. I remember when the TV show X-Play gave a 9/10 to some Resident Evil game and got a flood of angry “How dare you! This is obviously a 10/10 game!” letters . . . from people who could not have played the game because it hadn’t gone on sale yet. If review scores are skewed upwards, they are skewed upwards for a lot of reasons.

      1. Agammamon says:

        God help you if you have the slightest criticism of a Zelda game.

  4. Eric says:

    Any chance you could show us what the median scores were per year in comparison to the average?

    1. Geebs says:

      Exactly, the rise of shovelware since the PS2 era makes the mean average very prone to bias from skewing of the distribution.

    2. Echo Tango says:

      Mean + median + mode plz. Just giving us the mean isn’t super useful. :)

      1. John says:

        To heck with your measures of central tendency. I demand a full histogram!

        1. Echo Tango says:

          Oh yeah, this could be multi-modal…

  5. Ninety-Three says:

    This pissed people off and led to review bombing. So maybe it’s not a measure of the difference between critics and users, but the difference in quality between launch day, and the day the cash shop opens.

    It’s important to note that if your theory is right, it’s not the difference between launch day and cash shop day, it’s the difference between launch day quality and an arbitrary number that angry people are trying to lower. The archetypal microtransaction-fueled review bomb has a bunch of people leaving scores of zero without thinking the game is literally a zero out of ten: they’re just lashing out, trying to vent frustration or hit the studio in the review score. This means the score is no longer measuring anything about the game’s quality and can’t be usefully compared to anything.

    1. Echo Tango says:

      That, and the fact that only sufficiently-motivated people generally leaving reviews, are some reasons why Steam and Facebook only do thumbs-up or thumbs-down. If we could trust everyone’s scores to be how they actually viewed the game, we could use them for averages. Since only the positive/negative is true, and the amount is broken, those numbers (and the ratio between them) is about as good as we get.

    2. Wide And Nerdy says:

      If a game has microtransactions it warrants a zero out of ten. No exceptions. Microtransactions fundamentally ruin games they’re implemented in. Its impossible to design a good game for anybody but the rich with microtransactions and even the rich are unduly inconvenienced midplay depending on the implementation.

      1. Ninety-Three says:

        Quick, you’d better tell the millions of people enjoying Overwatch that they’re wrong, the game is objectively ruined for everyone that doesn’t spend hundreds of dollars on it.

        1. Wide And Nerdy says:

          I repeat. No exceptions.

          If they’re finding some way to enjoy a fundamentally ruined game, good for them. But its a 0/10 because it uses microtransactions.

          1. Ninety-Three says:

            As shoeboxjeddy said below, user reviews are intentionally bullshit. Thank you for providing an illustration of the phenomenon, I was afraid simply describing it might look like strawmanning.

            1. Wide And Nerdy says:

              People might be able to have fun building a house out of matchsticks. We don’t call the matchsticks a well designed game. People can have fun with things that are objectively terrible games or non games.

              Lootboxes and microtransactions are exploitation. Exploitation warrants a zero out of ten.

          2. Ninety-Three says:

            I can do better than the above snark, let’s try engaging seriously.

            Imagine I have a review system where the more I enjoyed a game, the lower a score I give it. Portal is a 0/10 and Aliens: Colonial Marines is a 10/10. If I were to go to Metacritic with my system and say “Best game ever, 0/10” then I would be using Metacritic wrong. The point of the numbers is as some kind of shorthand for quality, it’s simpler than making everyone select from a dropdown list of options that range from “perfect” and “excellent” to “terrible” and “gave me cancer”. Metacritic’s system isn’t inherently more valid than mine, there’s no particular reason high numbers should be good instead of bad, but everyone in gaming understands that high numbers are meant to indicate goodness, and if I show up using my system then I am communicating badly.

            That’s what you’re doing here. You have invented this novel review system where microtransactions are an automatic 0/10, and you can invent whatever system you want, but trying to plug it into the normal discourse where everyone’s speaking Metacritic-ese is not helpful. We’re all over here trying to talk in a language designed to inform purchasing decisions by conveying information about enjoyment, and you showing up to talk about how much you hate microtransactions is missing the point. I’m sure you do hate microtransactions, but that’s not what the numbers are for and it’s not what anyone else uses them to express. You might as well go to France and start speaking Spanish.

            Now I must I expand upon what shoeboxjeddy meant when he said user reviews are intentionally bullshit. I don’t think the above situation is a mistake you’ve made, you didn’t travel to France under a sincerely confused expectation that the locals all spoke fluent Spanish. Everyone knows that Metacritic’s system is about ranking games on something like enjoyability and you acknowledge Overwatch is more than zero enjoyable. But you really hate microtransactions, you hate Overwatch for having them, and you want to express that. Of course, “fuck microtransactions” is not a new message, so it’s neither satisfying nor effective to be the ten millionth person posting those words to the internet. But reviews get aggregated into a metascore, and if enough people leave negative reviews, the number will go down. So you violate the norms of the review system and leave a 0/10 because you’re not trying to be a helpful and contributing member of the enjoyability-measuring discourse, but because you’ve found a channel that hasn’t run out of bandwidth to have “fuck microtransactions” shouted into it.

            We invented these numbers that are good for conveying information, and every rageposter shouting into that system drowns out the signal. If you’ve ever hated a reviewer for giving a game a high score because it agreed with their politics instead of caring about how the game plays, this behaviour is the exact reflection of that. Stop it, you’re making reviews less useful.

            1. Wide And Nerdy says:

              That’s what you’re doing here. You have invented this novel review system where microtransactions are an automatic 0/10, and you can invent whatever system you want, but trying to plug it into the normal discourse where everyone’s speaking Metacritic-ese is not helpful.

              I expect for the purchase price of the game the complete content of the game. I don’t expect to pay 60 dollars for a game that has jackshit in it then pay 10 dollars per character for the characters I actually wanted that should have been in the damn game to begin with, and 5 dollars per item* for every piece of armor or weapon I want that should have been in the 60 dollar game I purchased. And if I don’t pay these prices I get stuck with crappy characters and cheap looking shit.

              And that’s not even talking about lootboxes which are worse, you can pay over and over and over again and never get the item you actually want, its a businesses wet dream.

              I engaged with a lootbox system one time, Mass Effect 3. I replay that game periodically and I grind the multiplayer because I’m a singleplayer completionist. I wanted a Volus. I’m not going to tell you how much I spent to get it. It was too much.

              Never again. Any game that has a lootbox system is an automatic not buy for me and THAT makes it a zero out of ten. Because the game negates its existence by having that system in it.

              *And if you actually play these games, you know damn well the prices are often actually much higher than that. World of Warcraft you can pay 25 bucks for a flying hippo mount. In Elder Scrolls Online you can pay I think a couple hundred dollars for some of the larger player houses and that’s not including cost of furniture nor the fact that you need a 15 dollar a month subscription to have enough furniture slots in your house to properly furnish it if what I’m reading is correct.

              1. Boobah says:

                There’s a difference between microtransactions ‘literally make the game unplayable without a bottomless wallet’ and ‘my character that I only see during the match summary screen isn’t as pretty as I want it to be.’

  6. Skyy High says:

    Another possible explanation: reviewer scores have mostly stayed constant for the last five years, averaging around 72, and the five years before that they stayed relatively constant, averaging around 70. I would probably chalk that up to editorial review and quality control for professional publications, attempting to make sure that games are reviewed relatively consistently specifically to avoid the type of criticism that Shamus alluded to about reviewers being less critical today.

    But, and this is the part that is completely anecdotal and speculative, in my experience internet culture has become more fractured and negative in the past 5-10 years. This is reflected everywhere, but not least of which is the seemingly increasing popularity of review-bombing. It really looks like the increased gulf between “user” and “critic” is due to user reviews dropping, rather than critic reviews substantially changing.

    Now there’s an argument to be made that games, in fact, ARE getting worse, and that critics are simply not updating their metrics to account for things like microtransactions and customer-unfriendly practices. On the other hand, we’re mostly past the dark times of spyware DRM being considered standard practice; at least today’s consumer-unfriendly gaming trends are out in the open (cash shops, day-one DLC) rather than hidden in the guts of your machine. And also, considering the rate of inflation and the increasing budget of games, it does need to be said that games should cost more than what they cost in 2000, but they don’t. The market won’t bear that. So it’s not inconceivable that a company could design a game in a way that they can extract more than the standard $60 from players, while also making that game honestly equal in quality to a game that was released for a flat $60 20 years ago.

    Tracking the quality of games vs their price points vs customer reception could be an entire thesis.

    1. Lino says:

      There is absolutely zero evidence to suggest game expenses have gone up in any meaningful way. I don’t have the time to do a detailed post, but the gist of it is:
      1. Engines are cheaper than they’ve ever been – licensing fees are peanuts compared to creating your own engine (which you were forced to do 20 years ago)
      2. There has been a dramatic rise in various types of middleware and companies that support video game development in various ways – lowering expenses even more
      3. The market is bigger than it’s ever been – there are more people buying games, so you can recoup your investment much more easily
      4. There is literally no evidence of expenses going up – just look at the 10-K forms of any of the publicly traded video game companies – EA, Blizzard, Take Two, NetEase, Bandai Namco… none of them show any meaningful rise in expenses – some even show a drop in expenses in recent years
      5. Game prices HAVE gone up – exchange rates have made games more expensive in various markets
      6. Most important of all – business doesn’t work that way. The idea of “Game prices should go up” assumes that it’s the consumers’ responsibility to keep the gaming industry afloat. But business doesn’t work that way. As a company, it’s your job to offer consumers a product at a price point at which you can make a profit. If you can’t, you should raise your prices. If you can’t make money at that price point, then that is a price point the market can’t support. The fact that you can’t make a profit is YOUR problem – not the customer’s.

      As an aside, note how companies saying “Game expenses have gone up, ergo we need to raise prices”:
      A) Say that in front of games media. However, when they do, they never share any data about this supposed rise.
      B) At the same time, they never mention rising expenses during their investor calls or in the forms they are required to publish by law, because lying in any of those avenues is an offense punishable by law.

      1. Asdasd says:

        Great analysis, especially regarding point 6. It’s a market! They can only charge what people are willing to pay – if the ‘correct’ price of games were higher, we would be paying it.

        Instead games are cheaper than ever, through sales and bundles and game passes and Epic giveaways. Why? Because companies are following the profit, and it’s leading in that direction.

        Something else that’s often forgotten – with the rise of digital, you no longer have to physically manufacture, package, or distribute your games. You don’t have to buy shelf space with a retailer and you don’t have to give them a cut of your revenue. That the day-one price of digital copies of games is at parity with physical means companies will have offset decades’ worth of inflation.

        1. Lino says:

          Good point about digital sales! That’s one of the big reasons we’ve seen such a huge influx of indies – the barrier to entry is lower.

          Which is something that doesn’t happen often – if the expenses in an industry are rising, you would expect to see less new entrants to the market, as the existing companies struggle to make a profit and either go out of business or divest and try to enter new industries, while the rest of them choose to remain and fight over an ever-dwindling market share.

          1. Richard says:

            Well, you do have to give the retailer (Steam, Epic, et al) a cut of your revenue, and you do “need” to buy ‘front-of-shelf’ rights to push your game above-the-fold, get it into an announcement etc to make sure that the digitial storefront shows your game to the target market segments.

            There’s so many games that practically nobody will find your game during the crucial first couple of weeks after launch. Without that there’s no word-of-mouth advertising, thus your game is basically DOA.

            Epic’s giveaways are skewing things somewhat – the developer is apparently still getting at least some revenue, but how are Epic choosing what they’ll give away?

            After all, pretty much everyone with an Epic account is going to take them up on the offer, and thus Epic will pay the developer quite a lot. One assumes not the 70% they’d normally get, but it’s still a hefty chunk of effectively guranteed revenue.

    2. GloatingSwine says:

      Also, review scores can reasonably be [i]expected[/i] to stay relatively constant because games are being reviewed on a scale that is roughly determined by the rest of the market into which they’re launching, and a “70” game in 2002 is not being judged by the same criteria as a “70” game that came out in 2020 because the first is being judged against the rest of the field of 2001/2 games and the second against the field of 2019/20 games.

      (That said, review scores are useless and nobody should pay attention to them. The valuable bit is the words that explain [i]why[/i] the reviewer has the opinion they do.)

  7. Chris says:

    I could attribute it to two things.

    1) Which i feel is the least accurate way to explain it, is that reviewers and audiences starts having diverging tastes and thoughts on what games should be. So stuff like “walking simulators” get scored high by reviewers while actual players feel they are just artsy BS. My evidence for this is the whole last of us “citizen kane moment of videogames” type of reviewing, instead of just looking at if the game is good or not.

    2) I feel this is more likely, and that is the reviewers score from 7 to 10, while audiences score from 0 to 10. Partially because of publisher pressure to be kind to a game, partially because of reviewers generally being pissed about a game not being good and thus giving it a 0, rather than a 5/10. Fallout 76 is my example for this. A game which is dreadful but still gets okayish marks from reviewers, but not from the audience. And COD which gets 9/10 “its okay”s while audiences thinks “its basic and not interesting so here’s a 7/10”.

    1. Asdasd says:

      The other thing that I’d point out is that Metacritic isn’t nearly comprehensive in assigning a Metascore for every game released (a knock on effect of there being more games released now than every before, but a lack of commensurate increase in critical comprehensiveness.)

      What this tends to mean is that high-profile games become landmarks in the yearly release calendar attract a lot of attention which gets kind of noisy. A user score average can be impacted by a viral incident which might have no bearing on its actual quality (someone releases a video of some memeable, but superficial, visual glitches), but on the other hand might have a preponderant one (delayed-opening microtransaction store; DRM; game-breaking bugs). Camaraderie within the games press and the invisible hand of marketing/PR spend tends to limit the skew on critical averages.

      Meanwhile, off to the side, user reviews on platforms such as GOG and Steam can be a valuable metric where no such distortive attention is drawn. Sometimes they’re only the best guide for the wary consumer, but the only one. Titles with thousands of reviews that don’t even have a metascore due to disinterest from the critical sphere are not uncommon.

      1. John says:

        I don’t fully trust user reviews on GOG for older games, at least not in the aggregate. Those reviews tend to be heavily nostalgia driven. “I played this game for three billion hours when I was thirteen and life was awesome. It is objectively the best game ever. Ten out five stars.” It helps that GOG also reports a “verified owner” average score, because you can at least be reasonably certain that those people have purchased the game recently. Even so, there’s no guarantee they’ve played it recently–I know I have a GOG backlog–or that their review score isn’t ninety percent nostalgia. As with professional reviews, I find that it pays to ignore the score and read the words. Look for reviews with something substantive to say. Skim over anything less than a paragraph long unless it includes such warning signs as “doesn’t run on my system” or “made my computer explode”.

        1. Dreadjaws says:

          A negative part of GOG not enforcing the use of a client is that, unlike Steam, it cannot measure how long a user has played the game. In any case, it pays to read the reviews. If they only say stuff like “Best game ever” or “It sucks” they can and should be ignored.

          Of course, the fact that people tend to ignore the writing anyway and only focus on the score is a whole different problem.

        2. jpuroila says:

          Given how common shills and bots are, you should be highly sceptical about any aggregate review scores you see online. What I typically do is look at a couple positive reviews, a couple negative reviews, and then dig a little to find a few reviews with 3-ish stars. If someone can find both substantial negative and substantial positive things to say about a game(or other product), what they say is probably going to be worth paying attention to, even if you don’t agree with them.

    2. shoeboxjeddy says:

      See this is where you tip your hand a bit too much. “So stuff like “walking simulators” get scored high by reviewers while actual players feel they are just artsy BS.” Why are the two groups “reviewers” and “actual players”? Obviously, reviewers ‘actually played’ the games. If this was just bad term choice, I would say the second group should be “regular consumers” or something. But the framing of what you’ve said makes it clear you’re in some kind of extreme opposition to reviewers.

      1. tmtvl says:

        Cuphead.

        EDIT: Also, Shamus once talked about how and why reviewers think differently about games, it having to do with reviewers playing so many new games so quickly in succession that they don’t have the same view of them as regular players.

        1. John says:

          Um, what about Cuphead? That’s an honest, serious question. The only two things I know about Cuphead are that it looks like a cartoon from the 30s and that it’s supposed to be pretty hard.

          1. tmtvl says:

            There is a notorious video of a Cuphead review where the reviewer couldn’t wrap their mind around the concept of using the dash mid-jump. During the tutorial, with a helpful hint on-screen.

            1. GloatingSwine says:

              Notorious, but not accurately described. It was not a video of a review, and was not presented as such or in the context of a review.

              1. shoeboxjeddy says:

                Yeah, that was a preview video from a promotional event. And the text of what that professional critic was that “I found this game too difficult for my personal tastes… but it was very beautiful to look at and better players will likely love this difficulty.” That’s a completely fair overview from an unskilled player who wouldn’t play this type of game themselves. I’ve seen A LOT of discourse from regular old players who found Cuphead FAR too difficult to enjoy, so it’s not like this unskilled player is some sort of replicant. There are people who wouldn’t be a good consumer for Cuphead because it’s beyond their abilities and outside of their taste.

          2. Syal says:

            I think it’s this playthrough by Dean Takahashi where he spent half an hour dying on the first level.

      2. Agammamon says:

        Reviewers get paid to review and they often don’t get paid to review games in the genres they want. They might not even be gamers. They might not even *like* videogames. But you gotta do what you gotta do to pay the light bill.

        Its one of those distortions on the reviewer side that skews scores because the dude into RPG’s is reviewing an FPS (because it has ‘RPG’ on the box) and is bored as hell. While the person angling for a job at The Washington Post thinks that walking simulator is the neatest thing they’ve ever seen.

        I mean, there’s a reason *Myst* is considered one of the greatest ‘games’ of all time.

        1. sheer_falacy says:

          Ah, the scare quotes. Because you don’t think Myst is a real game. Odd.

          And the inherent assumption that someone who likes a walking simulator must not be a real gamer.

          1. Asdasd says:

            They’re all games. 100% of walking simulators are games. Myst is a game. This has no bearing on the very real possibility that people for whom games are the daily grind*, rather than a hobby, have a strong disposition towards novelty, and that this might have some bearing on the question of why their aggregates diverge from users’. No need to drag the culture wars into it.

            * I imagine QA testers would show similar predilections towards respite from formulaic, mainstream games.

        2. shoeboxjeddy says:

          If you play a ton of games for work, you are a gamer. If you instead mean “reviewers don’t necessarily shape their identity around a love of video games and the related subculture”… that’s fine? Good even. There are PLENTY of fanboy reviews for whatever thing you’re looking at. A layperson review is a good resource to have, because everyone is a layperson in some subjects. You might be an expert in strategy and 4x games, but a complete novice in competitive fighting games. The novice review of Street Fighter V would be a good resource if you might enjoy that particular game.

        3. Ninety-Three says:

          I mean, there’s a reason *Myst* is considered one of the greatest ‘games’ of all time.

          Myst sold six million copies, in the nineties. It was the best selling videogame for an entire decade. Implying that its accolades are the product of some kind of disconnected ivory tower critics is as insane as calling The Empire Strikes Back “Oscar bait”.

          Maybe don’t call people fake gamers until your own knowledge of the medium is a little better.

          1. aitrus says:

            Heaven forbid any game accessible to the average person be considered “great”. The utter terror! Myst was a hit in part because it was a game anybody, literally anybody who knew how to handle a computer, could pick up without having deep prior knowledge of the language of video games that’s required to play most games. It’s good that things like that exist! The hobby doesn’t have to be so insular.

      3. Chris says:

        “See this is where you tip your hand a bit too much”
        You’re talking like im trying to hide something and I just got caught out. And I do think that reviewers have a tendency to ascribe more value to games than they really should.

        1. Chris says:

          “See this is where you tip your hand a bit too much”
          You’re talking like i’m trying to hide something and I just got caught out. I used “actual players” since I feel the way customers consume the product and how reviewers consume the product are different. The customers pay money and have to value how much enjoyment they got out of their buck. As opposed to paid reviewers who I feel might try to ascribe more to videogames than there really is. Is gone home just an interesting experiment in storytelling in videogame form? Or is it the start of a breakthrough that pushes the envelope. As customer you probably mostly care for what you actually got, while reviewers might try to see trends and use terms like “this changes everything”. As a result they try to anticipate these watershed games giving them high scores. While a person that just buys the game, sees if he enjoys it, and does not think about the further implications, might think it isnt that good.

          I don’t think the way reviewers review is neccesarily bad. If they put it in context then reading the review back 10 years later is a lot more interesting. Reading halo CE reviews for example is interesting to see how they thought about the 2 weapon limit and how it changed console shooters.

          For some reason the system didnt allow me to edit my comment

    3. aitrus says:

      hey there. “Actual player” here. I like walking simulators and all kinds of “arty BS”. The “reviewers versus players” binary is, in actuality, “specific subset of reviewers versus Me + the other players I assume must all have the same priorities as I” which is a really exclusionary and alienating attitude to take towards other people who love video games.

      That being said I agree an obsession with finding “the Citizen Kane of video games” or “the one that is True Art” is an unproductive, pointless way of looking at games.

      1. Sleeping Dragon says:

        Yeah, I will admit that I had a bit of a knee jerk reaction to the “actual players” thing, good thing we have a fairly chill community here and can address things that bother us like adults so… I do think this actually factors into the topic at hand somewhat. Not long ago we had a (admittedly reocurring) discussion in the comments on the definition of video games and how the term is perhaps not applied very accurately but we don’t have anything better that’s handy. Now I’m a fan of walking sims and artsy games and it would be very easy to go down the path of butthurt arguing that “codblops crowd are downvoting me precious feels games”* but while that might have happened I’m willing to bet it would only be a few high-profile cases. However, I do think the fact that we’re holding all these diverse titles under the umbrella term of “video games” and this has encountered some resistance from subset of gamers does lead to increase in distrust in the reviewers, which leads to increase in value of user opinions, which motivates people to do stuff like start their own gamereview youtube channels or create metacritic accounts, and as has been pointed out numerous times you’re more likely to go shouting your opinion from the rooftops when you’re in disagreement.

        *Though I guess we all have our outliers because for example I don’t get the infatuation with Gris, which felt very vacuous to me, to each their own.

  8. shoeboxjeddy says:

    The problem with user reviews is that they’re bullshit, intentionally so. People angry about “Dexit”? Pokemon Sword gets a ZERO out of 10. Do they really think it deserves zero points? Or just a harsh score as compared to the game’s quality otherwise? Same token, fans of Pokemon who feel that claims about Dexit are crazy? Pokemon Sword gets a TEN out of 10. Is it REALLY a perfect game experience just because other fans are too mad about it, in your opinion? So we often have factions of people using the score as a weapon instead of their actual opinion. Critics don’t do this, so their scores don’t look like this. If a critic thought that Pokemon Sword was unimpressively samey as compared to the possible innovation from moving to a console they’d probably give it a 6, rather than a 0.

    1. Geebs says:

      I completely disagree. There are plenty of well thought out and comprehensive user reviews, which address aspects of games which are important to me, and which might not make it into a professional review. These are easy to find on whichever platform you’re looking at, with the exceptions of metacritic and the EGS. A few poo flingers shouldn’t be used as an example to discount the efforts of decent human beings.

      1. Ninety-Three says:

        When those poo flingers have their votes aggregated into the same number as the decent human beings, it really does discount the efforts of the latter.

        1. PPX14 says:

          Not if those flingers legitimately do feel that way. Who are we to say that their opinions are invalid? Yes the game works, yes it would be fun for a newcomer, and most people might say this deserves an X/10. But for these people, the game has transgressed in the context of their own personal enjoyment, to the point that it deserves in their mind a Y/10, and is worthless. For example, I hated Tomb Raider 2013 even if I found the mechanics fairly fun. I could either average this out into a score, or go with my primary feeling about the game. And if I’d been waiting for a Kotor 4, but then they make one and it is a mobile game with lootboxes, then even if it works fine as what it is, to me it is useless. I think as long as the scores come with a little explanation this shouldn’t be an issue.

          1. shoeboxjeddy says:

            Here’s the thing, perhaps you find useful information in a review that says “Because Crash Team Racing added microtransactions in a post launch patch, the game is literally poo poo trash to me. I shit inside of the case and then burned the case. 0/10.” The issue isn’t that this person is insincere or a liar. It’s that they’re overreacting and not giving me really useful information about this game, save that they personally HATE microtransactions with the hate of a 1000 suns. If I personally am more okay with ignoring microtransactions, would I find the racing fun? The 0 review is pretty worthless for me to discover that. By the same token, a 10/10 review for Kingdom Come: Deliverance that only talks about political issues outside of the game and never mentions the length of the game, the mechanics of the game, how challenging it is, etc. would not be very informative either.

            1. PPX14 says:

              Ah I see – but then anyone’s opinion or point of view expressed which isn’t the same as your own is useless, to varying degrees depending on the strength of that opinion. They’re only “overrreacting” by your standards. Oh is that’s your point? That ultimately the user scores are useless, only their written content is? Whereas critic scores try to appeal to the majority opinion so are typically more useful?

              1. shoeboxjeddy says:

                A point of view different from my own could be very valuable… but not if that’s all there is. It would be equally worthless if they reviewed Farming Simulator ’17 by ONLY saying “John Deere has been added to the game, 10/10!” A brief review is fine, a random drive by to simply increase or decrease the score is useless to pretty much everyone involved.

                1. PPX14 says:

                  But that’s my point, that random drive by might well be a good representation of people’s opinions. They shouldn’t necessarily be disregarded as hokey outliers. If they were just that, they wouldn’t affect the score much – like the few people raging in a 97% upvoted Steam game review list. And if they are more than that, then they will and clearly enough people dislike or like the game enough to affect the average score. I’m not sure what more one could expect from an average score.

                  1. Ninety-Three says:

                    The random drive-bys shoeboxjeddy proposed aren’t even a good representation of their own opinions. The guy who gives it a 0/10 because they patched in cosmetic microtransactions doesn’t actually think the game is zero fun, and “John Deere has been added to the game, 10/10!” seems to be evaluating the game on a scale of how much it contains John Deere equipment rather than how good it is. The average score only means anything if it’s averaging reviews that are all made under the same scoring system, if some of the reviews are trying to assign numbers to enjoyability and some are giving out zeroes or tens on a whim, they will add up to nonsense.

                    1. PPX14 says:

                      Yeah I guess it depends what you read into an aggregate score in the first place. For me, the new Star Wars are all 0/10 as Star Wars films, at most 2/10, and therefore for me, that’s what they are. Nigh valueless rubbish. But I can also pretend to be objective and rank it as a film in its own right, or as a film if I ignore my despair.

                      Resulting in my Rise of Skywalker rundown immediately after seeing the film (my friend and I have a tradition of watching them, and out of morbid curiosity) being:

                      5/10 as a SW film
                      6/10 as a film
                      9/10 as a Disney SW film
                      10/10 as a Disney SW trilogy repairkit

                      5/10 music
                      0/10 Finn’s new haircut

                      But really, all of those scores would be disingenuous if I were to give a single score. An average of them certainly would be. Because really I despise the whole ethos of the things.

                      And thus I think my score for that can only be useful accompanied by the appropriate context.

                      Which hmm is suppose agrees with your standardised ranking scale if you wanted to quantify it more rigorously with specifics. For me a film that is barely functional doesn’t get a 0, it just doesn’t get rated at all. Exempt.

                      Interesting, let’s see if I can come up with something:

                      0 Film had clear technical issues with video or sound rendering it unfit to watch

                      1 Film you would not watch again even if paid to do so at your normal working rate, during work hours

                      2 Film you found intensely boring, reprehensible, or disagreeable, but would watch again if paid to do so at work during working hours

                      3 Film you disliked to the point of warning others against watching it. If no other entertainment were available, you would last a week before watching it again to check if it held some redeeming qualities

                      4 Film with some redeeming qualities and some issues which make it one that you would rewatch only for very specific moments. Worthwhile putting up with for the sake of others who enjoy it, or for the fun of watching as a group.

                      5 Film with as many redeeming qualities as negative. Rewatchable.

                      6 Film with enough to recommend to others as an entire film, not just for specific scenes.

                      7 Film that you feel is genuinely good and would choose to watch again within a month.

                      8 Film you feel is very good and significant to you personally. Worthy of recommending to everyone regardless of genre. Rewatchable throughout your life.

                      9 Near perfect film that you cherish.

                      10 Your favourite film, bar none.

                      Needs some work, haha :D

      2. Echo Tango says:

        The people review-bombing usually out-number the people taking the time to write thoughtful, honestly-numbered reviews. That makes it pretty hard to gauge “actual” review averages. Do you always filter out 0/10? Do you only filter reviews like that if they’re part of a mass of same-numbered reviews in a large batch? What about 10/10? Those scores are often from biased fanboys, but they wouldn’t necessarily be as clustered in time. Do only mindless 10/10 people buy on launch-day, and do they always buy on launch-day instead of later? Cleaning up review data can be done, but it’s not something you get for free.

      3. Asdasd says:

        Agreed. I think people’s takes on this will be largely informed by whether they actually make use of user reviews or not. If you’re never actually looking at them, of course your opinion is going to be shaped predominantly by headlines on scare stories over review bombs by teh evil gamurz and other moral panics. If you are using them your baseline is much more likely to be the good-faith, constructive reviews that have risen to the top (something else that’s under-appreciated; these systems are often self-vetting), be they either for or against the games in question.

      4. shoeboxjeddy says:

        How would you FIND the well thought out, comprehensive user reviews when 1000 people do either 0’s or 10’s on launch day? Are you suggesting scanning all 1000? What if the well thought out one is a 10, but it’s a really well reasoned 10? How would you know that THIS 10 was actually the most valid one of the whole group, without reading 1000 reviews? What you’re suggesting is pretty unrealistic.

        1. CountAccountant says:

          I think the issue is a misinterpretation of your first sentence: “The problem with user reviews is that they’re bullshit, intentionally so.”

          From the context of both the rest of your text and from Shamus’s article, I read your sentence to be about user review *scores*. I agree with you, and I think your position is generally noncontroversial.

          Geebs says “I completely disagree,” but everything in the comment only discusses user review *text.* He/she argues that the additional information in good user reviews can be valuable for purchasing decisions. Again, a generally noncontroversial position. Many platforms have systems to assist users with finding reviews that other users have rated helpful.

          In other words, I think Geebs is disagreeing with a point that you did not make. You both likely agree on all relevant points.

        2. Agammamon says:

          Look at the mid-ranged reviews. Usually you can skip to them. 3-7’s.

        3. Geebs says:

          As CountAccountant said above – yes, the easiest way to filter user reviews is to read them. People reviewing in bad faith usually reveal themselves in the first few sentences.

          Your original post came across to me as broadly assuming that all user reviews were inherently worthless, which would be begging the question.

          With respect to the usefulness of numerical Metacritic scores, professional reviews have taken to omitting numerical scores in order to get people to focus on the text, so those are getting interpreted by Metacritic’s “reads like a 7” algorithm anyway.

  9. Asdasd says:

    How is this all affected when we factor in Metacritic’s proprietary ‘black-box’ process for aggregating a Metascore, which isn’t a typical average of all the scores taken, but instead uses per-outlet weightings known only to themselves?

    1. Ninety-Three says:

      Given that we don’t know the weightings, we can’t factor it in.

      The weird thing about Metacritic’s weighting process is that they’ve never disclosed why they weight reviews, or what the weightings are, despite the fact that this isn’t the kind of adversarial endeavour where disclosing your weightings could get the system gamed. They’re being secretive for seemingly no reason. Even the conspiracy theory explanation (that weighting is a lie designed to give them wiggle room to adjust scores in exchange for bribes) requires implausibly many people be in on it without a word ever having leaked.

  10. Brandon says:

    In fairness, 1995 through 1999 seem to have given us a lot of eternally classic games, like Half Life, StarCraft, Ocarina of Time, Metal Gear Solid, Resident Evil 2, Baldur’s Gate, NFL Blitz, Donkey Kong Country, NBA Jam, Killer Instinct, MK3, SUPER MARIO 64, Duke Nukem 3D, QUAKE, Grab Turismo, Goldeneye 007, Diablo, Final Fantasy VII, Fallout, Unreal Tournament, Soul Caliber, Chrono Cross, System Shock 2, and more! What an era in gaming.

    1. baud says:

      I think that in any 5 years span, since video games have become mainstream, you’d find an equivalent number of great games (just not necessarily in all genres). And I think that the metacritic average scores for pre-2001 are that high, it’s because only those great games had been entered, instead of the thousands of new games that got released afterward.

    2. Asdasd says:

      Truly a golden age, and a privilege to live and see the medium grow through . Every other month you’d see a landmark title break ground or a new genre be born. But yes, I think the main reason for the higher average is that the legacy classics got their review scores aggregated while the dross of the era did not.

  11. Interesting data.

    One other thing that might be happening here is that critics will generally be giving scores around the time of release, whereas users might be reviewing it years or decades later, especially older games like those released before Metacritic launched. Presumably the years given here are for the game’s release, rather than the year of the review?

  12. Ibb says:

    I have a slightly different interpretation for the critic’s score trend: If you discount 2006, 2007, and 2010, the average score stayed very close to 70 during 2001-2013, with a deviation of maybe 2 points (2002 and 2005, but it’s hard to read). 2006 and 2007 are easily explained as Shamus said with the release of shoddy PC ports. But I can’t figure out 2010. Did several good games come out that year? Did several positive review sites spin up and then die? Did Metacritic change their review acceptance criteria for one year and then change it back?

    If you look at the data with this new idea of “norm of 70 except for a couple of exception years,” then it’s much easier to see the drastic increase in average scores in the recent years. 2019 reaches an average of almost 5 points above that norm, 2014-2018 were all safely above 70. I honestly don’t put much stock in numeric scores, but it’s interesting to see that Metacritic’s average for the past 5 years has been measurably higher than the average over the previous 10 years.

    I don’t think this is due to more good games coming out since 2014, but the glut of games (i.e. indie boom) could mean that fewer bad games are being reviewed. In a world where Custer’s Revenge was coming out alongside 20 games of Mega Man or Sonic the Hedgehog quality, do you think anyone would have given CR a second glance? On the other hand, this could easily explain the sharp drop in user score averages, because many people would still be playing those games, even if the critics aren’t.

  13. Dreadjaws says:

    The fact of the matter is that user reviews and critic reviews are bound to be different because it’s a different experience for each group, particularly in the last few years. Leaving aside particular cases:

    – Critics generally have to play a whole game before reviewing it. Users can play for a bit and if they’re not engaged they can leave it. As such, critics can be more forgiving after finding the game gets better or harsher after being forced to play something they don’t like for so long.
    – Critics have to play a game quick to release their review as soon as possible. Customers play at their own pace, and as such they experience them differently.
    – Critics play games at release, while customers can do it at any time from release to the current day. This means critics have to deal with unpatched games or, sadly, games that haven’t had anti-consumer measures added yet. Customers, on the other hand, can deal with a whole different product, for better or worse, and maybe feel that hype is dead if they get too late, which can play a role.
    – While not necessarily bribed, critics are often given care packages full of merchandise to incentivate them to be more positive. Sometimes, in the case of games with microtransactions, they’re given a bunch of in-game currency. Customers get nothing of the sort.
    – Most importantly, critics generally don’t pay for the games, while customers do. Paying customers tend to be more critical, particularly if a game offers monetization beyond the retail price.

    They’re really two very different groups of people. I’d be actually surprised if the review scores were more similar.

    1. Daimbert says:

      But then this would be a failure on the part of critics/reviewers, because what they’re really supposed to do is review games and using their knowledge of games and their audience rate it on the basis of what the experience should be like for someone in the audience. Other than critiques like that of Shamus or Chuck Sonnenberg, what are reviewers good for if I can’t look at their review and decide whether I want to play the game or not? So if they can’t translate their experience into mine — or, rather, that of an average audience member — then they aren’t doing their job as a reviewer.

      1. PPX14 says:

        Ahh I think beyond the basic mechanical things and bugs there isn’t a huge amount that they necessarily can do, unless you as a consumer seek out those reviewers who seem to agree with your views. Mostly it’s just entertainment!

        I’d say my consumption of game/film review content is 5% consumer advice, 95% entertainment. Unlike e.g. PC Hardware which is the other way round, or is since I got bored of watching hardware reviews on Youtube haha.

        1. Daimbert says:

          Well, I think that a game reviewer should be able to play a game and identify what overall genre it’s in, what it does well as per the standards of that genre, what it does poorly as per the standards of that genre, what it subverts from the typical play of that genre, what new things the game is doing, and how well those differences work in the overall context of the game. Given this, I think that pretty much everyone who might have interest in that game would be able to read that review and at least get a pretty good idea if it’s something that might interest them. Players who like the genre will be able to identify how good a representative of the genre it is and if the new things are compelling enough to try it, and people who aren’t fans of the genre will be able to determine if the things it does differently appeal to them. Given that an actual professional reviewer, in general, should play enough games at least in one genre to do that, this doesn’t seem like something that they would be unable to do but is at least what those seeking out the review would generally want.

      2. shoeboxjeddy says:

        It is impossible for the reviewer to pretend to be you during their review process. For the obvious reason: there are MILLIONS of “you’s” out there possibly wanting to read their reviews, and all the yous are different and have different tastes. Some people will play the new XCOM games and say “this is too complicated, and why are we taking turns? I want to aim the gun myself, like in Halo!” Some would play the same game and say “this is too dumbed down, why can’t I take all the complex actions that were possible in the original Xcom?” The only sensible course for the reviewer to take is to write the most descriptive review they can from their own perspective and people will gain information from that that they can apply to their own tastes. What I sense from your comment Daimbert, is a wiff of the “just write an OBJECTIVE game review!” perspective, which is a completely nonsense thought experiment that is nevertheless surprisingly common.

        1. Daimbert says:

          Well, I’d be asking them to write a DESCRIPTIVE review, yes, but that doesn’t mean that there’s any meaningful “From their perspective” to matter. For your example, all the reviewer needs to say, at least primarily, is that it’s turn-based and describe how complex it is compared to comparable games that the customers have probably played before. Ideally, the reviewer should be able to say if the game will move fast enough for FPS fans or is complex enough for people who liked the complexity of the original. I’m not asking them to get into any one person’s head, but to at least be able to identify audiences and outline the basic things that those audiences tend to care about and/or like.

          To put it in one point, a good reviewer should be able to play a game and say what audiences might be attracted to that game and what audiences won’t like it. This doesn’t mean that everyone in those audiences will think that way, but that’s what the rest of the review is for: to outline the details of the game so each audience can personalize as necessary. But if a reviewer can only say whether or not THEY like the game, I have to say that that’s a bad reviewer.

          EDIT: As an example, in my own post on Huniepop I commented that the dating sim elements aren’t deep enough for dating sim fans and are too prominent for puzzle fans, in general. But my overall discussion highlighted that it was a mix of the two, which then could lead people like myself who can enjoy both to give it a chance. I think a professional reviewer could at least do that and probably do better.

  14. Gnarl says:

    Well, review scores from anyone are a pointless metric. Any criticism for something as complex as a game summed up as a number should be ignored as meaningless. So I’d suggest that this project needed very much the opposite of more time.

    1. aitrus says:

      Yeah, in my experience scores are generally bad and unhelpful in deciding both a. what the objective (lol) value of a game is and b. whether or not I personally will enjoy (or otherwise find value in) a game. So what is the point of a score? This kind of analysis Shamus is doing is still interesting to me though.

      1. Sleeping Dragon says:

        I guess this is as good a place as any to post my “scores are BS” rant. So here’s the thing, people above have mentioned all sorts of issues with both reviewer and player scores, like review bombing on the one hand and conflict of interests on the other, but I think this doesn’t tackle the core issue that, well, scores are BS.

        The concept of numerically expressed scores is tied to the, laughable in this case, notion of “objectivity”. Numbers are objective. 3 is larger than 2, 6 is twice as big as 3, it’s a fact (stay away ye math academics, let’s keep it simple), it’s objective. As such if we give one game a 7 and another an 8… well the numbers are objective, so this is clearly stating that the second game is better, right? So if we give both games a 7 than they are clearly the same? It’s like the old joke-riddle about which is heavier: a kilogram/pound of feathers or a kilogram/pound of lead, the answer is obviously that they weigh the same but it doesn’t mean that lead is equally good for stuffing your pillow or that feathers are going to protect you from Superman oggling your privates.

        And that’s the crux of my issue with scores. While we could argue about certain qualities being more measureable than others (though in all honesty I’d debate most of those that immediately come to mind as well) the key quality of “enjoyment” of even games in the same genre is NOT objectively measureable. That’s not even getting into the fact that different games* aim to provide different experiences, not all of them attempting “enjoyment” or “entertainment” in the most obvious understanding of these terms, nor that people play even the same one game differently and looking for different things. Because we use the umbrella term “video games” and arguing that scores assign objective value we’re effectively getting into fights arguing that if you want your home vacumed you should buy this vegetable chopper because it has better scores than the vacuum cleaner and they’re both home appliances, and then we bitch about people downvoting the chopper because it didn’t vacum the house properly!

        *And let’s just use the term and not get into it in this particular rant.

        1. Asdasd says:

          I agree that the more granular you try to be, the sillier it gets. Much as I loved PC Gamer’s reviews growing up, the idea that there was a substantive difference between an 83% and an 84% was asinine.

          But I think there’s some value in scores where they map to broad, useful categories. I actually think the maligned 5-star system has value, for instance, but not as a mathematical or numeric measure. Better to think of each star as a meaningful category:

          1 star – not enough value to be worth playing; mostly or wholly flawed
          2 star – some value, but outweighed by the flaws; might be worth a look (but caveat emptor)
          3 star – some flaws, but outweighed by the value; probably worth a look (but caveat emptor)
          4 star – definitely enough value to be worth playing
          5 star – overwhelmingly enough value to be worth playing; a landmark title for the genre/medium

          The scale stills skews towards the positive, but I think as consumers these are broadly the meaningful categories we place things in – bad, mediocre, good, great, amazing.

          1. Sleeping Dragon says:

            I mean, it’s going to be an opinion and it’s not my place to tell people what tools to use in their consumer decisions but using broad categories still doesn’t solve the core problems in my opinion. In fact I suspect the silly granularity was a response to issues with “broad categories”. On the one hand it’s ridiculous to think that a reviewer has some kind of innate capability of determining a game is 1% worse/better than another game, on the other if you use broad categories you’ll end up bunching games of varying quality together, and trust me, this will absolutely not resolve any conflicts in the community.

            But more importantly it doesn’t resolve the fundamental issues I mentioned as your proposition is, again, based on the assumption that we have some way of objectively measuring value of a game. Which is preposterous as it will vary from person to person based on what they enjoy, the amount of free time they have, their financial means. I mean, I’m a fan of Spiderweb games, they’re among the few titles I’m likely to buy on release. So clearly they should be a 5, right? But they’re hardly revolutionary, Spiderweb has been making very similar games for a long time and while they repesent a certain quality they’re not really going to push the isometric RPG genre forward. So let’s make it a 4? Except their production values tend to be on the low side, in fact they’ve often been criticisized for “ugly” graphics and minimalist sound and I know for a fact it turns a lot of players off… so 3? I mean, realistically, could I ever give a game more than 3 “objectively”? Or even 2? I don’t know what the people looking at my rating value so literally everything is either “might be worth a look” or “probably worth a look at”.

            So at this point you* could argue two ways. One is that aggregate scores will resolve the issue, but the very presence of (genuine) low scores means that the scoring system has failed at least some of the customers. Second is that people can actually read the review to figure out if the reviewer enjoyed the story or gameplay, if the lowering of the score is due to graphics or writing… which again shows that the score is a poor indicator because you have to read the review anyway. I am willing to concede that Lino’s and Daimbert’s argument below has some merit, you can filter games by high rankings and delve deeper into select titles but even then claiming that there can be one “video game scoring system” is, particularly from a point of view of a specific player/customer, not even comparing apples and oranges, it’s comparing vacuum cleaners and potato peelers.

            *Not you specifically, the hypothetical everycommenter you.

        2. Daimbert says:

          Well, this is probably a good place to put in my comment on scores.

          For determining whether or not I should or want to play a game, the review score is useless, even if it comes from a good reviewer. What I’m interested in is the actual text review itself, where the reviewer talks about the game and WHY they gave it a particular score. Because from there I can read that and decide if the things they liked about it are things that I like, and if the things they disliked are things that I might like or at least be able to live with. I have on at few occasions bought games based on negative reviews because the things they complained about are things that I actually liked in a game. So I always want to know why a game gets a certain score, and not just what score it gets, which makes these sort of analyses somewhat pointless for me.

          That being said, I like some kind of scoring system on a review so that I know that their overall assessment of the game is. I mostly researched games on Gamefaqs, and I’d read the reviews with the highest scores and the ones with the lowest scores to see what people, at least, thought were good and bad about a game. Without scores, I can’t do that, nor can I prepare myself for how I’m going to have to interpret the review (ie look to see if what they dislike is what I like or try to filter through their likes to see if it interests me).

          So, for me, determining which games are “good” by a reviewer or even multiple reviewers’ scores is probably more of an academic exercise than a consumer exercise. But scores as at least, at a minimum, a guide to what the general opinion is so that you can know what to expect is useful.

          1. Lino says:

            In the age of hundreds (or thousands) of reviews, using scores as a filter is an indispensible feature. Although I haven’t used it for games, I’ve used this technique multiple times for online courses (which usually use the standard 5-star system).

            This is why replacing the 5- or 10-score system with a simple “Thumbs Up/Thumbs Down” isn’t a perfect solution, because there’s no way to delineate between strong likes, dislikes, and all the grades in-between.

            That being said, I don’t read or watch game reviews anymore, but when I did, I almost never looked at the score, just at what the reviewer had to say. In cases where I wasn’t all that crazy about the game, I would just read the Pros and Cons breakdown at the end, and see if something caught my attention.

    2. jpuroila says:

      I would argue that while individual scores are meaningless, looking at them in aggregate should tell us something.

      At least, if we can assume that the majority of reviewers are doing their best to give the games a fair score, which is a point of contention judging by the discussion above.

    3. PPX14 says:

      I disagree with this – Total Biscuit used to say much the same and I had time listening to him rant about it to form my opinion haha.

      I think that ultimately when people describe how much they like a game – if that is the goal, describing their opinion, as opposed to specific consumer advice – they will end up describing the different aspects using adjectives, and the strength of those adjectives tell us about how much they liked or disliked the element being described. And from this all we build a picture of their overall enjoyment of the game.

      Of course, I’m sure it can be argued that a measure of the critic’s enjoyment of the game is irrelevant, it is the description of the elements of the game that form useful consumer advice, and that’s fine.

      But I think that a numerical system is just another way to depict what would usually be demonstrated by this “adjective strength”. One person says that was fantastic. Another says 9/10. Another might say brilliant, and this he might translate into 8/10. Whatever it is, the numbers are a nice shorthand that can be compared easily across the person’s various opinions. So that when I hear that Angry Joe said that the gameplay and story were “great”, and the characters were “fun”, and he gave it a 7/10, I know that is different to when he said those elements were “really good”, and gave it a 9/10.

  15. John says:

    Hey, Shamus, after further consideration, I think there’s a potentially very serious methodological problem with your analysis. It all depends on how you compute “average critic score per year” and “average user score per year”. I’ll try to show you what I mean with a simple example. Suppose that there were two games released in the year 20XX. Metacritic has one critic score for Game A and one user score. The critic score is 9 and the user score is 9. There is no difference of opinion between critics and users on Game A. Metacritic has one user score for Game B and two user scores. The critic score is 7 and the two user scores are also 7. Again, there is no difference of opinion between critics and users on Game B. However, the average critic score for 20XX is 8 (the average of 9 and 7) while the average user score is approximately 7.6 (the average of 9, 7, and 7).

    It looks like critics gave out higher review scores even when we know that by construction critic and user opinions are exactly the same. The apparent difference is false, an artifact of the method in which we computed average scores for the year. If I’m right about your methodology then your graphs could easily be explained by two factors: first, an increase over time in the number of users relative to critics and, second, a tendency by users to submit disproportionately more reviews for low-scoring games. If I had the data I think you do and I wanted to compare critic review scores to user review scores, I’d study the difference between the average critic score and the average user score on a game-by-game basis.

    I’m sorry if I’ve mis-characterized your analysis, but you don’t explain how you compute the average score for each year and group. If you averaged the score for each game and then used those averages to compute the average score for the year, then the problem isn’t as serious. I’m also sorry if I seem overly picky. Once upon a time it was my actual, literal job to do this kind of thing to other people’s data analysis and old habits die hard.

    1. PPX14 says:

      Good point – but suuuurely it’s game by game?

      1. John says:

        I thought so. As I said, that’s how I’d do it. But something another commenter said got me thinking and when I re-read the article it wasn’t clear.

  16. Hector says:

    Also, Shamus, are you putting your Bethesda article up? I’ve watched the video ( …three times now) but there’s no article and the link from Youtube is dead.

  17. evilmrhenry says:

    I grabbed a data set from https://www.kaggle.com/skateddu/metacritic-games-stats-20112019/data and looked up the games with the most difference between the critic and user scores:
    NBA 2K18 (nasty microtransactions)
    FIFA 19 (nasty microtransactions)
    Company of Heroes 2 (nazi apologia, according to the negative Metacritic reviews, also an unbalanced multiplayer. I offer no comment on the truth of either statement.)
    Call of Duty: Infinite Warfare – Sabotage (not enough scores, basically)
    Out of the Park Baseball 17 (Not sure what the deal is here. Nothing bad in the actual user reviews.)
    Chime Sharp (XBOX One version, not enough scores)
    Star Wars Battlefront II (Yeah, that game)
    Madden NFL 19 (microtransactions)
    NBA 2K19 (microtransactions)
    Call of Duty: WWII – The War Machine (not enough scores)
    FIFA 18 (surprisingly, not microtransactions, just people not liking the game)
    Artifact (microtransactions)
    Minecraft: Story Mode Season Two – Episode 1: Hero in Residence (not enough scores)
    The Inner World (not enough scores)
    Battlefield V (possible actual review bomb, but it’s also a Battlefield game.)
    Fortnite (microtransactions. Also, a lot of these reviews are from before the battle royale mode was added, and the game was a bit bad then.)
    Batman: The Enemy Within – Episode 1: The Enigma (not enough reviews)

    My thoughts:
    1) It seems users are not okay with microtransactions, and will rate a game that they would otherwise enjoy at 0 just because of them. (Not that I blame them.) Reviewers generally don’t seem to include the cash shop model in their score calculation. (A lot of this is probably review bombing in a technical sense, but the term means something different now.)
    2) If you have a game that nobody reviewed, someone is going to come by and rate it 0/10.

    From 2016 to the present, there has been a rapid increase in the number of microtransaction-driven games, which does match up well with the rapid shift in user opinions in your graph above. What I’m not sure of is if there’s enough microtransaction-driven games to move the entire industry. These are usually tentpole releases, and there’s a lot of games released in a year.

    With that in mind, the most important question is if “average user reviews” in your chart are per-game, or are just the average score across all reviews. (I.e., would a trillion 0/10 ratings on a single game just affect that game, or would it drop the grand average to just above 0.) If it’s not per-game, it’s likely that the massive number of downvotes on high-profile games that use microtransactions are having an actual effect on the average user rating. If not, there’s something else going on.

    1. Higher_Peanut says:

      In regards to Fortnite it was pitched and early accessed as a very different game so not only are some of the reviews from an outdated version it also collects reviews from those who bought in and disliked the fact they feel ditched in the focus switch.

      With how much microtransactions affect the experience and balance of games I’d love to see them discussed and included in reviews. Pity we’re already at the point where they’re patched in later. Honestly we’re at the point where for some games the content is patched in later but only if the game has the potential to generate the money. Reviewers can all get wildly varying experiences over time.

  18. Decius says:

    Is it easy to do a graph of the absolute value of the average difference between the average critic score and the average user score per year?

    That is: (average |critic score of game k-user score of game k|)

  19. Chad Miller says:

    Something just happened that nicely ties in to all of the review bombing discussion.

    Blizzard just released Warcraft III: Reforged, and it’s reportedly horrible. Worse, they’ve shut off the servers for the original Warcraft III so people who have that game now can’t play online without downloading Reforged. Would it be review-bombing to go 0-rate a game I don’t intend to play because it was used as a pretext to ruin another game by the same company that I already have?

    https://www.metacritic.com/game/pc/warcraft-iii-reforged

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply to Wide And Nerdy Cancel reply

Your email address will not be published.