--- title: IMDB Top 250 outliers date: "2006-04-04T12:00:00Z" categories: - how-i-do-things wp_id: 376 description: I used Excel to analyze the IMDb Top 250, identifying outliers in the correlation between ratings, vote counts, and release years. I found that popularity doesn't always match quality for classics like Seven Samurai or blockbusters like The Matrix. keywords: [imdb, data analysis, excel, movie ratings, correlation, outlier detection] --- On the [IMDb top 250](http://www.imdb.com/chart/top), you normally see a correlation between the number of votes and the rating for a movie. Better rated movies are more watched. The outliers are interesting. [![IMDb: Correlation between number of votes and rating](/blog/assets/flickr-imdb-correlation-between-number-of-votes-and-rating_123075107_o-gif.webp)](/blog/assets/flickr-imdb-correlation-between-number-of-votes-and-rating_123075107_o-gif.webp) The movies that are popular despite not having a high rating are: - [The Matrix](http://www.imdb.com/title/tt0133093/) - [The Sixth Sense](http://www.imdb.com/title/tt0133093/) - [Gladiator](http://www.imdb.com/title/tt0172495/) - [Star Wars 3: Revenge of the Sith](http://www.imdb.com/title/tt0121766/) - [Pirates of the Caribbean](http://www.imdb.com/title/tt0121766/) I can understand why The Sixth Sense, Pirates of the Caribbean and especially The Matrix are on this list -- geeks would have watched these and voted on IMDb, though their voting need not have been high. But why are Gladiator and Sixth Sense on that list? Movies that are highly rated, but not as popular are: - [The Godfather 1](http://www.imdb.com/title/tt0068646/) - [The Godfather 2](http://www.imdb.com/title/tt0071562/) - [Seven Samurai](http://www.imdb.com/title/tt0047478/) - [Rear Window](http://www.imdb.com/title/tt0047396/) - [The Good, The Bad, The Ugly](http://www.imdb.com/title/tt0060196/) Seven Samurai and The Good, The Bad, The Ugly probably didn't get the votes they deserve because they're written in their Japanese and Mexican names on IMDb. I hadn't seen them for a long time for the same reason. As for The Godfather, I personally think it's just overrated. But Rear Window? That's a surprise. Hitchcock thriller with all the classic elements... Another correlation is between the rating and the year of the movie. Early movies get lower ratings than recent movies. Technique could be the reason, but I doubt it. In any case, some movies stand out of their time. [![IMDb: Correlation between rating and year of movie](/blog/assets/flickr-imdb-correlation-between-rating-and-year-of-movie_123075103_o-gif.webp)](/blog/assets/flickr-imdb-correlation-between-rating-and-year-of-movie_123075103_o-gif.webp) - [The Shawshank Redemption](http://www.imdb.com/title/tt0111161/) - [The Godfather 1](http://www.imdb.com/title/tt0068646/) - [The Godfather 2](http://www.imdb.com/title/tt0071562/) - [Seven Samurai](http://www.imdb.com/title/tt0047478/) - [Rear Window](http://www.imdb.com/title/tt0047396/) - [Casablanca](http://www.imdb.com/title/tt0034583/) - [Citizen Kane](http://www.imdb.com/title/tt0033467/) - [Metropolis](http://www.imdb.com/title/tt0017136/) - [M](http://www.imdb.com/title/tt0022100/) - [Modern Times](http://www.imdb.com/title/tt0027977/) - [City Lights](http://www.imdb.com/title/tt0021749/) I haven't seen Metropolis or M. But among the others, I think Citizen Kane is the one that deserves to stand out, if only for portraying the anti-hero, and for not having a happy ending. The Shawshank Redemption was a bit of a surprise. Few people that I know have heard of it. And yet, there it is, right on top. --- ## Comments - **Madhu** _4 Apr 2006 10:56 am_: Some analysis this:) were u consulting for IMDB sometime?:) - **S Anand** _4 Apr 2006 12:22 pm_: Nah, just had some time on my hands early this morning! - **ritzkini** _4 Apr 2006 2:29 pm_: :D cool anal ! but..how did you get the raw data ?? is my question ! - **S Anand** _4 Apr 2006 3:53 pm_: Just cut and paste data on the IMDb top 250 page on Excel! - **Shankar** _5 Apr 2006 3:24 am_: Another conclusion one could draw: The conclusion that a movie is good seems representative of public opinion due to the high number of votes, but that a movie is bad is only the conclusion of a few, and hence may not be representative of public opinion. Is this a correct conclusion? - **S Anand** _5 Apr 2006 6:04 am_: I wouldn't say that from this data. These represent the top rated movies on IMDb, i.e. the top 250 movies EVER. We can only say that high ratings are contributed by both large and small samples. Even that would be a weak statement because IMDb has a "minimum number of votes" cutoff for the top 250. - **Govar** _17 Jan 2007 6:30 pm_: Another interesting thing I've noticed in IMDB ratings is that a lot of votes are meant either to push up or pull down. For example, almost every movie will have more number of people rating 1 than ratings 2,3 or 4. Which coudl mean only one thing: They want to bring down the rating. Similarly, lot of people go to the other extreme - 10 - instead of say 8 or 9. - **S Anand** _17 Jan 2007 7:40 pm_: That's an interesting observation... did you get that out of the raw IMDb data? I'll have a look at that. Should be interesting analysis to do. - **fdf** _24 Apr 2007 1:04 pm_: imdb top 250 is not good, don;t listen to it, pp;l vote up a film, and vote films down! godgather and LOTR is over rated!