{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# *Is Fandango Still Inflating Ratings?*\n", "In October 2015, Walt Hickey from FiveThirtyEight published a [popular article](https://fivethirtyeight.com/features/fandango-movies-ratings/) where he presented strong evidence which suggest that Fandango's movie rating system was biased and dishonest.\n", "\n", "### Goal\n", "In this project, we'll analyze more recent movie ratings data to determine whether there has been any change in Fandango's rating system after Hickey's analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Understanding the Data\n", "We'll work with two samples of movie ratings: the [data](https://github.com/fivethirtyeight/data/tree/master/fandango) in one sample was collected previous to Hickey's analysis, while the other sample was collected [after](https://github.com/mircealex/Movie_ratings_2016_17). Let's start by reading in the two samples (which are stored as CSV files) and getting familiar with their structure." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FILMRottenTomatoesRottenTomatoes_UserMetacriticMetacritic_UserIMDBFandango_StarsFandango_RatingvalueRT_normRT_user_norm...IMDB_normRT_norm_roundRT_user_norm_roundMetacritic_norm_roundMetacritic_user_norm_roundIMDB_norm_roundMetacritic_user_vote_countIMDB_user_vote_countFandango_votesFandango_Difference
0Avengers: Age of Ultron (2015)7486667.17.85.04.53.704.3...3.903.54.53.53.54.01330271107148460.5
1Cinderella (2015)8580677.57.15.04.54.254.0...3.554.54.03.54.03.524965709126400.5
2Ant-Man (2015)8090648.17.85.04.54.004.5...3.904.04.53.04.04.0627103660120550.5
3Do You Believe? (2015)1884224.75.45.04.50.904.2...2.701.04.01.02.52.531313617930.5
4Hot Tub Time Machine 2 (2015)1428293.45.13.53.00.701.4...2.550.51.51.51.52.5881956010210.5
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ " FILM RottenTomatoes RottenTomatoes_User \\\n", "0 Avengers: Age of Ultron (2015) 74 86 \n", "1 Cinderella (2015) 85 80 \n", "2 Ant-Man (2015) 80 90 \n", "3 Do You Believe? (2015) 18 84 \n", "4 Hot Tub Time Machine 2 (2015) 14 28 \n", "\n", " Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue \\\n", "0 66 7.1 7.8 5.0 4.5 \n", "1 67 7.5 7.1 5.0 4.5 \n", "2 64 8.1 7.8 5.0 4.5 \n", "3 22 4.7 5.4 5.0 4.5 \n", "4 29 3.4 5.1 3.5 3.0 \n", "\n", " RT_norm RT_user_norm ... IMDB_norm RT_norm_round \\\n", "0 3.70 4.3 ... 3.90 3.5 \n", "1 4.25 4.0 ... 3.55 4.5 \n", "2 4.00 4.5 ... 3.90 4.0 \n", "3 0.90 4.2 ... 2.70 1.0 \n", "4 0.70 1.4 ... 2.55 0.5 \n", "\n", " RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round \\\n", "0 4.5 3.5 3.5 \n", "1 4.0 3.5 4.0 \n", "2 4.5 3.0 4.0 \n", "3 4.0 1.0 2.5 \n", "4 1.5 1.5 1.5 \n", "\n", " IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count \\\n", "0 4.0 1330 271107 \n", "1 3.5 249 65709 \n", "2 4.0 627 103660 \n", "3 2.5 31 3136 \n", "4 2.5 88 19560 \n", "\n", " Fandango_votes Fandango_Difference \n", "0 14846 0.5 \n", "1 12640 0.5 \n", "2 12055 0.5 \n", "3 1793 0.5 \n", "4 1021 0.5 \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "previous = pd.read_csv('fandango_score_comparison.csv')\n", "after = pd.read_csv('movie_ratings_16_17.csv')\n", "previous.head()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movieyearmetascoreimdbtmeteraudiencefandangon_metascoren_imdbn_tmetern_audiencenr_metascorenr_imdbnr_tmeternr_audience
010 Cloverfield Lane2016767.290793.53.803.604.503.954.03.54.54.0
113 Hours2016487.350834.52.403.652.504.152.53.52.54.0
2A Cure for Wellness2016476.640473.02.353.302.002.352.53.52.02.5
3A Dog's Purpose2017435.233764.52.152.601.653.802.02.51.54.0
4A Hologram for the King2016586.170573.02.903.053.502.853.03.03.53.0
\n", "
" ], "text/plain": [ " movie year metascore imdb tmeter audience fandango \\\n", "0 10 Cloverfield Lane 2016 76 7.2 90 79 3.5 \n", "1 13 Hours 2016 48 7.3 50 83 4.5 \n", "2 A Cure for Wellness 2016 47 6.6 40 47 3.0 \n", "3 A Dog's Purpose 2017 43 5.2 33 76 4.5 \n", "4 A Hologram for the King 2016 58 6.1 70 57 3.0 \n", "\n", " n_metascore n_imdb n_tmeter n_audience nr_metascore nr_imdb \\\n", "0 3.80 3.60 4.50 3.95 4.0 3.5 \n", "1 2.40 3.65 2.50 4.15 2.5 3.5 \n", "2 2.35 3.30 2.00 2.35 2.5 3.5 \n", "3 2.15 2.60 1.65 3.80 2.0 2.5 \n", "4 2.90 3.05 3.50 2.85 3.0 3.0 \n", "\n", " nr_tmeter nr_audience \n", "0 4.5 4.0 \n", "1 2.5 4.0 \n", "2 2.0 2.5 \n", "3 1.5 4.0 \n", "4 3.5 3.0 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "after.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FILMFandango_StarsFandango_RatingvalueFandango_votesFandango_Difference
0Avengers: Age of Ultron (2015)5.04.5148460.5
1Cinderella (2015)5.04.5126400.5
2Ant-Man (2015)5.04.5120550.5
3Do You Believe? (2015)5.04.517930.5
4Hot Tub Time Machine 2 (2015)3.53.010210.5
\n", "
" ], "text/plain": [ " FILM Fandango_Stars Fandango_Ratingvalue \\\n", "0 Avengers: Age of Ultron (2015) 5.0 4.5 \n", "1 Cinderella (2015) 5.0 4.5 \n", "2 Ant-Man (2015) 5.0 4.5 \n", "3 Do You Believe? (2015) 5.0 4.5 \n", "4 Hot Tub Time Machine 2 (2015) 3.5 3.0 \n", "\n", " Fandango_votes Fandango_Difference \n", "0 14846 0.5 \n", "1 12640 0.5 \n", "2 12055 0.5 \n", "3 1793 0.5 \n", "4 1021 0.5 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fandango_previous = previous[['FILM', 'Fandango_Stars', \n", " 'Fandango_Ratingvalue', \n", " 'Fandango_votes', \n", " 'Fandango_Difference']].copy()\n", "fandango_after = after[['movie', 'year', 'fandango']].copy()\n", "fandango_previous.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movieyearfandango
010 Cloverfield Lane20163.5
113 Hours20164.5
2A Cure for Wellness20163.0
3A Dog's Purpose20174.5
4A Hologram for the King20163.0
\n", "
" ], "text/plain": [ " movie year fandango\n", "0 10 Cloverfield Lane 2016 3.5\n", "1 13 Hours 2016 4.5\n", "2 A Cure for Wellness 2016 3.0\n", "3 A Dog's Purpose 2017 4.5\n", "4 A Hologram for the King 2016 3.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fandango_after.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our goal is to determine whether there has been any change in Fandango's rating system after Hickey's analysis. The population of interest for our analysis is made of all the movie ratings stored on Fandango's website, regardless of the releasing year.\n", "\n", "Because we want to find out whether the parameters of this population changed after Hickey's analysis, we're interested in sampling the population at two different periods in time — previous and after Hickey's analysis — so we can compare the two states.\n", "\n", "The data we're working with was sampled at the moments we want: one sample was taken previous to the analysis, and the other after the analysis. We want to describe the population, so we need to make sure that the samples are representative, otherwise we should expect a large sampling error and, ultimately, wrong conclusions.\n", "\n", "**Sample Before**
\n", "From Hickey's article and from the [README.md](https://github.com/fivethirtyeight/data/tree/master/fandango) of the data set's repository, we can see that he used the following sampling criteria:\n", "\n", "- The movie must have had at least 30 fan ratings on Fandango's website at the time of sampling (Aug. 24, 2015).\n", "- The movie must have had tickets on sale in 2015.\n", "\n", "The sampling was clearly not random because not every movie had the same chance to be included in the sample — some movies didn't have a chance at all (like those having under 30 fan ratings or those without tickets on sale in 2015). It's questionable whether this sample is representative of the entire population we're interested to describe. It seems more likely that it isn't, mostly because this sample is subject to temporal trends — e.g. movies in 2015 might have been outstandingly good or bad compared to other years.\n", "\n", "**Sample After**
\n", "The sampling conditions for our other sample were (as it can be read in the [README.md](https://github.com/mircealex/Movie_ratings_2016_17) of the data set's repository):\n", "\n", "- The movie must have been released in 2016 or later.\n", "- The movie must have had a considerable number of votes and reviews (unclear how many from the README.md or from the data).\n", "\n", "This second sample is also subject to temporal trends and it's unlikely to be representative of our population of interest.\n", "\n", "---\n", "**Purposive Sampling**
\n", "Both these authors had certain research questions in mind when they sampled the data, and they used a set of criteria to get a sample that would fit their questions. Their sampling method is called purposive sampling (or judgmental/selective/subjective sampling). While these samples were good enough for their research, they don't seem too useful for us." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Changing the Goal of our Analysis\n", "\n", "At this point, we can either collect new data or change our the goal of our analysis. We choose the latter and place some limitations on our initial goal.\n", "\n", "Instead of trying to determine whether there has been any change in Fandango's rating system after Hickey's analysis, our new goal is to determine whether there's any difference between Fandango's ratings for popular movies in 2015 and Fandango's ratings for popular movies in 2016. This new goal should also be a fairly good proxy for our initial goal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Isolating the Samples we need\n", "\n", "With this new research goal, we have two populations of interest:\n", "\n", "1. All Fandango's ratings for popular movies released in 2015.\n", "2. All Fandango's ratings for popular movies released in 2016.\n", "\n", "---\n", "\n", "**Checking whether movies are popular in both populations**\n", "\n", "We need to be clear about what counts as popular movies. We'll use Hickey's benchmark of 30 fan ratings and count a movie as popular only if it has 30 fan ratings or more on Fandango's website.\n", "\n", "Although one of the sampling criteria in our second sample is movie popularity, the sample doesn't provide information about the number of fan ratings. We should be skeptical once more and ask whether this sample is truly representative and contains popular movies (movies with over 30 fan ratings).\n", "\n", "One quick way to check the representativity of this sample is to sample randomly 10 movies from it and then check the number of fan ratings ourselves on Fandango's website. Ideally, at least 8 out of the 10 movies have 30 fan ratings or more.\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movieyearfandango
108Mechanic: Resurrection20164.0
206Warcraft20164.0
106Max Steel20163.5
107Me Before You20164.5
51Fantastic Beasts and Where to Find Them20164.5
33Cell20163.0
59Genius20163.5
152Sully20164.5
4A Hologram for the King20163.0
31Captain America: Civil War20164.5
\n", "
" ], "text/plain": [ " movie year fandango\n", "108 Mechanic: Resurrection 2016 4.0\n", "206 Warcraft 2016 4.0\n", "106 Max Steel 2016 3.5\n", "107 Me Before You 2016 4.5\n", "51 Fantastic Beasts and Where to Find Them 2016 4.5\n", "33 Cell 2016 3.0\n", "59 Genius 2016 3.5\n", "152 Sully 2016 4.5\n", "4 A Hologram for the King 2016 3.0\n", "31 Captain America: Civil War 2016 4.5" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fandango_after.sample(10, random_state=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Above we used a value of 1 as the random seed. This is good practice because it suggests that we weren't trying out various random seeds just to get a favorable sample.\n", "\n", "As of January 2019, these are the fan ratings we found:\n", "\n", "\n", "|**Movie**|**Fan Ratings**|\n", "|---|:---|\n", "|Mechanic: Resurrection | 2250|\n", "|Warcraft |\t7280|\n", "|Max Steel\t| 494|\n", "|Me Before You|\t5270|\n", "|Fantastic Beasts and Where to Find Them | 13477|\n", "|Cell\t|\t18|\n", "|Genius\t|\t127|\n", "|Sully\t|\t11889|\n", "|A Hologram for the King |\t501|\n", "|Captain America: Civil War | 35143|\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "90% of the movies in our sample are popular. This is enough and we move forward with a bit more confidence.\n", "\n", "Let's also double-check the other data set for popular movies. The documentation states clearly that there're only movies with at least 30 fan ratings, but it should take only a couple of seconds to double-check here." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(fandango_previous[fandango_previous['Fandango_votes'] < 30])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Isolating movies released in 2015 from our *fandango_previous* data set**" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FILMFandango_StarsFandango_RatingvalueFandango_votesFandango_Difference
0Avengers: Age of Ultron (2015)5.04.5148460.5
1Cinderella (2015)5.04.5126400.5
2Ant-Man (2015)5.04.5120550.5
3Do You Believe? (2015)5.04.517930.5
4Hot Tub Time Machine 2 (2015)3.53.010210.5
\n", "
" ], "text/plain": [ " FILM Fandango_Stars Fandango_Ratingvalue \\\n", "0 Avengers: Age of Ultron (2015) 5.0 4.5 \n", "1 Cinderella (2015) 5.0 4.5 \n", "2 Ant-Man (2015) 5.0 4.5 \n", "3 Do You Believe? (2015) 5.0 4.5 \n", "4 Hot Tub Time Machine 2 (2015) 3.5 3.0 \n", "\n", " Fandango_votes Fandango_Difference \n", "0 14846 0.5 \n", "1 12640 0.5 \n", "2 12055 0.5 \n", "3 1793 0.5 \n", "4 1021 0.5 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fandango_previous.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FILMFandango_StarsFandango_RatingvalueFandango_votesFandango_DifferenceYear
0Avengers: Age of Ultron (2015)5.04.5148460.52015
1Cinderella (2015)5.04.5126400.52015
2Ant-Man (2015)5.04.5120550.52015
3Do You Believe? (2015)5.04.517930.52015
4Hot Tub Time Machine 2 (2015)3.53.010210.52015
\n", "
" ], "text/plain": [ " FILM Fandango_Stars Fandango_Ratingvalue \\\n", "0 Avengers: Age of Ultron (2015) 5.0 4.5 \n", "1 Cinderella (2015) 5.0 4.5 \n", "2 Ant-Man (2015) 5.0 4.5 \n", "3 Do You Believe? (2015) 5.0 4.5 \n", "4 Hot Tub Time Machine 2 (2015) 3.5 3.0 \n", "\n", " Fandango_votes Fandango_Difference Year \n", "0 14846 0.5 2015 \n", "1 12640 0.5 2015 \n", "2 12055 0.5 2015 \n", "3 1793 0.5 2015 \n", "4 1021 0.5 2015 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fandango_previous['Year'] = fandango_previous['FILM'].str[-5:-1]\n", "fandango_previous.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2015 129\n", "2014 17\n", "Name: Year, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fandango_previous['Year'].value_counts()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2015 129\n", "Name: Year, dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Creating a seperate data set\n", "\n", "fandango_2015 = fandango_previous[fandango_previous['Year'] \n", " == '2015'].copy()\n", "fandango_2015['Year'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Isolating movies released in 2016 from our *fandango_after* data set**" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
movieyearfandango
010 Cloverfield Lane20163.5
113 Hours20164.5
2A Cure for Wellness20163.0
3A Dog's Purpose20174.5
4A Hologram for the King20163.0
\n", "
" ], "text/plain": [ " movie year fandango\n", "0 10 Cloverfield Lane 2016 3.5\n", "1 13 Hours 2016 4.5\n", "2 A Cure for Wellness 2016 3.0\n", "3 A Dog's Purpose 2017 4.5\n", "4 A Hologram for the King 2016 3.0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fandango_after.head()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2016 191\n", "2017 23\n", "Name: year, dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fandango_after['year'].value_counts()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2016 191\n", "Name: year, dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Creating a seperate data set\n", "\n", "fandango_2016 = fandango_after[fandango_after['year'] == 2016].copy()\n", "fandango_2016['year'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing Distribution Shapes for 2015 and 2016" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import matplotlib.style as style\n", "style.use('fivethirtyeight')\n", "from numpy import arange\n", "\n", "font = {'family': 'serif',\n", " 'color': 'black',\n", " 'weight': 'bold',\n", " 'size': 14,\n", " }\n", "\n", "fandango_2015['Fandango_Stars'].plot.kde(label='2015', figsize=(10, 7.8), color = '#FFB7A2')\n", "fandango_2016['fandango'].plot.kde(label='2016', color='#96BBE6')\n", "plt.title(\"Comparing distribution shapes for Fandango's ratings\\n(2015 vs 2016)\", y = 1.0)\n", "plt.xlim(0, 5)\n", "plt.xlabel('Stars')\n", "plt.xticks(arange(0, 5.1, 0.5))\n", "plt.legend(loc='upper left', fontsize=14)\n", "plt.savefig('density.png')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The striking aspect of the figure above is:\n", "\n", "- The 2016 distribution is slightly shifted to the left relative to the 2015 distribution.\n", "\n", "The slight left shift of the 2016 distribution is very interesting for our analysis. It shows that ratings were slightly lower in 2016 compared to 2015. This suggests that there was a difference indeed between Fandango's ratings for popular movies in 2015 and Fandango's ratings for popular movies in 2016. We can also see the direction of the difference: the ratings in 2016 were slightly lower compared to 2015." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing Relative Frequencies" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.0 8.527132\n", "3.5 17.829457\n", "4.0 28.682171\n", "4.5 37.984496\n", "5.0 6.976744\n", "Name: Fandango_Stars, dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fandango_2015['Fandango_Stars'].value_counts(normalize=True).sort_index() * 100" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.5 3.141361\n", "3.0 7.329843\n", "3.5 24.083770\n", "4.0 40.314136\n", "4.5 24.607330\n", "5.0 0.523560\n", "Name: fandango, dtype: float64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fandango_2016['fandango'].value_counts(normalize=True).sort_index() * 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In 2016, very high ratings (4.5 and 5 stars) had significantly lower percentages compared to 2015. In 2016, under 1% of the movies had a perfect rating of 5 stars, compared to 2015 when the percentage was close to 7%. Ratings of 4.5 were also more popular in 2015 — there were approximately 13% more movies rated with a 4.5 in 2015 compared to 2016.\n", "\n", "The minimum rating is also lower in 2016 — 2.5 instead of 3 stars, the minimum of 2015. There clearly is a difference between the two frequency distributions.\n", "\n", "For some other ratings, the percentage went up in 2016. There was a greater percentage of movies in 2016 that received 3.5 and 4 stars, compared to 2015. 3.5 and 4.0 are high ratings and this challenges the direction of the change we saw on the kernel density plots." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Determining the Direction of the Change" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fandango 2015\n", "[4.0852713178294575, 4.0, 4.5] \n", "\n", "Fandango 2016\n", "[3.887434554973822, 4.0, 4.0]\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
20152016
Mean4.0852713.887435
Median4.0000004.000000
Mode4.5000004.000000
\n", "
" ], "text/plain": [ " 2015 2016\n", "Mean 4.085271 3.887435\n", "Median 4.000000 4.000000\n", "Mode 4.500000 4.000000" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def get_stats(data, col):\n", " return list([data[col].mean(), data[col].median(), \n", " data[col].mode()[0]])\n", " \n", "print('Fandango 2015')\n", "print(get_stats(fandango_2015, 'Fandango_Stars'), '\\n')\n", "\n", "print('Fandango 2016')\n", "print(get_stats(fandango_2016, 'fandango'))\n", "\n", "summary = pd.DataFrame(index=['Mean', 'Median', 'Mode'])\n", "summary['2015'] = get_stats(fandango_2015, 'Fandango_Stars')\n", "summary['2016'] = get_stats(fandango_2016, 'fandango')\n", "summary" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "summary['2015'].plot.bar(color = '#FFB7A2', align = 'center', label = '2015', width = .25)\n", "summary['2016'].plot.bar(color = '#96BBE6', align = 'edge', label = '2016', width = .25,\n", " rot = 0, figsize = (8,5))\n", "\n", "plt.title('Comparing summary statistics: 2015 vs 2016', y=1.07)\n", "plt.ylim(0,5.5)\n", "plt.yticks(arange(0,5.1,.5))\n", "plt.ylabel('Stars')\n", "plt.legend(loc = 'upper center')\n", "plt.savefig('summary.png')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.04842683568951993" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(summary.loc['Mean'][0] - summary.loc['Mean'][1]) / summary.loc['Mean'][0]" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "The mean rating was lower in 2016 with approximately 0.2. This means a drop of almost 5% relative to the mean rating in 2015." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion\n", "\n", "Our analysis showed that there's indeed a slight difference between Fandango's ratings for popular movies in 2015 and Fandango's ratings for popular movies in 2016. We also determined that, on average, popular movies released in 2016 were rated lower on Fandango than popular movies released in 2015.\n", "\n", "We cannot be completely sure what caused the change, but the chances are very high that it was caused by Fandango fixing the biased rating system after Hickey's analysis. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }