It’s Time To Abandon Star Ratings

It’s time to stop using star ratings.

Professional critics have long been rating books and movies from 1 to 5 stars, and for the most part, it’s worked out as intended: A 1-star movie is terrible, a 5-star movie is excellent, and a 3-star movie is pretty good. This system works reasonably well for two main reasons. First, professional critics can watch all the major movies, or at least a broad and representative selection of them–the good, the bad, and the ugly. Second, professional critics ideally have no direct relationship with studios and publishers; they occupy a different segment of the industry and their access to material for review is not typically dependent on the reviews they give. (Cases where reviewers were punished for the content they published, like when gaming magazine Kotaku was blacklisted by Bethesda, are rightly regarded as extremely bad behavior on the part of the studio.

The result is a range of ratings that reflect the full range of quality of books and movies. Aggregate data, compiled by FiveThirtyEight in their article on Fandango, demonstrates this: Reviews from Rotten Tomatoes are fairly evenly distributed between 1/2 and 5 stars, and reviews from Metacritic between 1 1/2 and 4 1/2 stars.

The graph also includes four sets of user-generated ratings, and here the problem becomes obvious. While Rotten Tomatoes users are fairly restrained and their rating distribution is not too different from Metacritic’s, IMDB and Metacritic users use the rating system we all know and love from sites like Amazon and Goodreads: A huge spike at 3 1/2 and virtually nothing below 2 1/2. All told, some 85% of IMDB user ratings fall in the 3-to-4-star range. (If you’re wondering why Fandango ratings skew noticeably higher, read the FiveThirtyEight article.)

The reasons are simple effects of human behavior. Regular people mostly watch things they expect to like, and they’re usually right. People are more motivated to rate things they feel strongly about, and absent any need to appear impartial, they can freely lavish 5-star reviews on anything they like. Balancing those people out are the ubiquitous irate users who give books and movies low ratings because an online seller shipped something too slowly or because it wasn’t a pack of Jimmy Dean sausages. And then there are the men who systemically give 1-star ratings to shows aimed at women.

Aside from that last one, all these impulses are benign and the people involved have every right to rate the work in question that way (well, maybe not the Jimmy Dean guy). But this disparate tangle of motivations gets flattened into a single aggregate number that almost always ends up being a little bit above average.

How do you discern a movie’s actual quality from this number? You don’t. Nobody uses IMDB user ratings as a guide to pick movies. At best, you can use it to separate normal-quality works from extremely bad-quality ones, and that’s it. The only sites where user ratings were ever any real use were places like Netflix that used algorithms to adjust the score shown. Since the algorithm was proprietary, we have no way of knowing how much other users’ ratings actually factored in as compared to the user’s own ratings, genres, demographics, and other factors; it may have been a very small amount indeed.

In the book world, the factors at play are even more complicated. Here the large pool of authors in need of publicity and the large pool of reviewers in need of traffic can and do interact as important parts of the thriving online book community. While the community is a good thing, it introduces a whole new web of motivations and consequences. Both authors and reviewers benefit when all books receive positive reviews; indeed, most reviewers won’t even post any review with less than 3 stars.

This raises the question of whether 3 stars still means average or whether it’s effectively 1 star, since it’s the lowest rating awarded. The answer may vary from reviewer to reviewer. Further, if refusing to review a book is a coded negative review, then how many reviews a book has becomes a metric for evaluating its quality—but there are multiple factors here, too. How do you tell the difference between a book that has few reviews because it’s bad and a book that has few reviews because it’s obscure?

Again, none of this is wrong; it’s how you maintain a healthy, positive community. But in the process, the star rating becomes farther and farther removed from a simple evaluation of the quality of the book. The vast majority of novels written by people within the book community have Goodreads ratings above 3 1/2 and below 4 1/2. Essentially the only way a book can move out of this range is if it attracts mass disapproval, but even that isn’t useful information because the book may have been targeted for any number of reasons, ranging from extremely offensive content to an ex-boyfriend with a grudge.

Star ratings may seem like an essential part of a review, because you have to communicate whether or not you liked the book in some way, but notice that professional journals like Kirkus and Booklist don’t use them. Instead, they award particularly good books a single star. This cuts to the heart of what people really want to know when they look at a book’s star rating: Is it good or not? Fundamentally, people won’t make different decisions based on whether you give a book four stars or five (or on the other end of the spectrum, one star or two); they want to know whether or not you recommend it, and that’s a yes/no question.

This binary nature is reflected in many other rating systems. Rotten Tomatoes flattens all ratings into a simple fresh/rotten binary and aggregates from there; its ratings are some of the most reliable and widely consulted by actual moviegoers. Metacritic uses a three-tier system of “mostly positive,” “mixed,” or “mostly negative,” again much simpler than a star rating system and more closely related to the basic question of “Should I watch/read/play this?”

Advocates of star ratings usually argue that they’re useful because they add nuance. But aggregate ratings inherently lack nuance. You can’t maintain fine distinctions of opinion when you lump hundreds of reviews together into a single number. Based on rating alone, it’s impossible to tell the difference between an excellent work that got targeted by a hate mob, a good work with a few major problems, an okay work, a not so good work with a loyal fan base, and a terrible work that’s become an ironic cult classic: All of them end up with about three stars. A star rating could add nuance to an individual review…but so could just reading the text of the review.

The nail in the coffin of star ratings is the old reliable principle of subjectivity. Two readers may agree completely on the strong and weak points of a book, but give it different ratings depending on their personal preferences. For one person, phenomenal execution may completely make up for a somewhat implausible premise; for another, if the premise doesn’t hold up, the execution is just turd-polishing. One person won’t even notice an underwhelming action scene if there’s swoon-worthy romance; another can overlook tepid relationships if there’s kickass action. A few people won’t forgive any weakness and give much lower average ratings. In order to extract any value at all from a rating, you need to know who wrote it and what they value in a book.

The solution to all these problems is the same: Instead of looking at the number, read the whole review.

By reading the actual reviews, you can easily sort out the people who are angry about Jimmy Dean sausages and the reviews left by the author’s doting relatives. If there was mass voting, you can identify why it happened. You can see which aspects of the book the reviewer liked the most and the least, which allows you to decide for yourself whether you’re likely to feel the same way, rather than taking it on faith that the majority of the population shares the same opinions as you (a premise that essentially never holds true for me, at least). It allows reviewers to point out triggers, potentially problematic issues, and minor problems that wouldn’t merit a lower rating but which readers would want to be aware of. Overall, a full review allows for a more complex, meaningful conversation. Don’t we all benefit from that?

So let’s stop trying to cram all our thoughts and opinions into a single number between one and five and move away from star rating systems.

Leave a Reply