{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Analyzing Star Wars Survey Data\n", "\n", "## Introduction\n", "\n", "In this project, we'll aim to explore and clean exists survey from Star Wars fans using the online tool SurveyMonkey to answer the following question: *does the rest of America realize that “The Empire Strikes Back” is clearly the best of the bunch?.* The data contains 835 total responses, which can be downloaded from their [GitHub repository](https://github.com/fivethirtyeight/data/tree/master/star-wars-survey).\n", "\n", "The data has several columns, including:\n", "\n", "- RespondentID - An anonymized ID for the respondent (person taking the survey)\n", "- Gender - The respondent's gender\n", "- Age - The respondent's age\n", "- Household Income - The respondent's income\n", "- Education - The respondent's education level\n", "- Location (Census Region) - The respondent's location\n", "- Have you seen any of the 6 films in the Star Wars franchise? - Has a Yes or No response\n", "- Do you consider yourself to be a fan of the Star Wars film franchise? - Has a Yes or No response\n", "\n", "### Summary of results\n", "\n", "After analyzing the data, we reached that the episode 5 “The Empire Strikes Back” is the most seen and best ranked episode by most of the respondents. In general, the earlier movies seem to be more popular. \n", "\n", "### Exploring the data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RespondentIDHave you seen any of the 6 films in the Star Wars franchise?Do you consider yourself to be a fan of the Star Wars film franchise?Which of the following Star Wars films have you seen? Please select all that apply.Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film....Unnamed: 28Which character shot first?Are you familiar with the Expanded Universe?Do you consider yourself to be a fan of the Expanded Universe?ξDo you consider yourself to be a fan of the Star Trek franchise?GenderAgeHousehold IncomeEducationLocation (Census Region)
0NaNResponseResponseStar Wars: Episode I The Phantom MenaceStar Wars: Episode II Attack of the ClonesStar Wars: Episode III Revenge of the SithStar Wars: Episode IV A New HopeStar Wars: Episode V The Empire Strikes BackStar Wars: Episode VI Return of the JediStar Wars: Episode I The Phantom Menace...YodaResponseResponseResponseResponseResponseResponseResponseResponseResponse
13.292880e+09YesYesStar Wars: Episode I The Phantom MenaceStar Wars: Episode II Attack of the ClonesStar Wars: Episode III Revenge of the SithStar Wars: Episode IV A New HopeStar Wars: Episode V The Empire Strikes BackStar Wars: Episode VI Return of the Jedi3...Very favorablyI don't understand this questionYesNoNoMale18-29NaNHigh school degreeSouth Atlantic
23.292880e+09NoNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNYesMale18-29$0 - $24,999Bachelor degreeWest South Central
\n", "

3 rows × 38 columns

\n", "
" ], "text/plain": [ " RespondentID Have you seen any of the 6 films in the Star Wars franchise? \\\n", "0 NaN Response \n", "1 3.292880e+09 Yes \n", "2 3.292880e+09 No \n", "\n", " Do you consider yourself to be a fan of the Star Wars film franchise? \\\n", "0 Response \n", "1 Yes \n", "2 NaN \n", "\n", " Which of the following Star Wars films have you seen? Please select all that apply. \\\n", "0 Star Wars: Episode I The Phantom Menace \n", "1 Star Wars: Episode I The Phantom Menace \n", "2 NaN \n", "\n", " Unnamed: 4 \\\n", "0 Star Wars: Episode II Attack of the Clones \n", "1 Star Wars: Episode II Attack of the Clones \n", "2 NaN \n", "\n", " Unnamed: 5 \\\n", "0 Star Wars: Episode III Revenge of the Sith \n", "1 Star Wars: Episode III Revenge of the Sith \n", "2 NaN \n", "\n", " Unnamed: 6 \\\n", "0 Star Wars: Episode IV A New Hope \n", "1 Star Wars: Episode IV A New Hope \n", "2 NaN \n", "\n", " Unnamed: 7 \\\n", "0 Star Wars: Episode V The Empire Strikes Back \n", "1 Star Wars: Episode V The Empire Strikes Back \n", "2 NaN \n", "\n", " Unnamed: 8 \\\n", "0 Star Wars: Episode VI Return of the Jedi \n", "1 Star Wars: Episode VI Return of the Jedi \n", "2 NaN \n", "\n", " Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. \\\n", "0 Star Wars: Episode I The Phantom Menace \n", "1 3 \n", "2 NaN \n", "\n", " ... Unnamed: 28 Which character shot first? \\\n", "0 ... Yoda Response \n", "1 ... Very favorably I don't understand this question \n", "2 ... NaN NaN \n", "\n", " Are you familiar with the Expanded Universe? \\\n", "0 Response \n", "1 Yes \n", "2 NaN \n", "\n", " Do you consider yourself to be a fan of the Expanded Universe?ξ \\\n", "0 Response \n", "1 No \n", "2 NaN \n", "\n", " Do you consider yourself to be a fan of the Star Trek franchise? Gender \\\n", "0 Response Response \n", "1 No Male \n", "2 Yes Male \n", "\n", " Age Household Income Education Location (Census Region) \n", "0 Response Response Response Response \n", "1 18-29 NaN High school degree South Atlantic \n", "2 18-29 $0 - $24,999 Bachelor degree West South Central \n", "\n", "[3 rows x 38 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "# Read in the star_wars dataframe\n", "star_wars = pd.read_csv(\"star_wars.csv\", encoding=\"ISO-8859-1\")\n", "star_wars.head(3)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['RespondentID',\n", " 'Have you seen any of the 6 films in the Star Wars franchise?',\n", " 'Do you consider yourself to be a fan of the Star Wars film franchise?',\n", " 'Which of the following Star Wars films have you seen? Please select all that apply.',\n", " 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8',\n", " 'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.',\n", " 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13',\n", " 'Unnamed: 14',\n", " 'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.',\n", " 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19',\n", " 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23',\n", " 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',\n", " 'Unnamed: 28', 'Which character shot first?',\n", " 'Are you familiar with the Expanded Universe?',\n", " 'Do you consider yourself to be a fan of the Expanded Universe?ξ',\n", " 'Do you consider yourself to be a fan of the Star Trek franchise?',\n", " 'Gender', 'Age', 'Household Income', 'Education',\n", " 'Location (Census Region)'],\n", " dtype='object')" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Review the column names\n", "star_wars.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cleaning and Transforming the dataset\n", "First, we'll need to remove the invalid rows. For example, RespondentID is supposed to be a unique ID for each respondent, but it's blank in some rows. Let's remove any rows with an invalid RespondentID.\n", "\n", "#### Cleaning RespondentID Column" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RespondentIDHave you seen any of the 6 films in the Star Wars franchise?Do you consider yourself to be a fan of the Star Wars film franchise?Which of the following Star Wars films have you seen? Please select all that apply.Unnamed: 4Unnamed: 5Unnamed: 6Unnamed: 7Unnamed: 8Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film....Unnamed: 28Which character shot first?Are you familiar with the Expanded Universe?Do you consider yourself to be a fan of the Expanded Universe?ξDo you consider yourself to be a fan of the Star Trek franchise?GenderAgeHousehold IncomeEducationLocation (Census Region)
13.292880e+09YesYesStar Wars: Episode I The Phantom MenaceStar Wars: Episode II Attack of the ClonesStar Wars: Episode III Revenge of the SithStar Wars: Episode IV A New HopeStar Wars: Episode V The Empire Strikes BackStar Wars: Episode VI Return of the Jedi3...Very favorablyI don't understand this questionYesNoNoMale18-29NaNHigh school degreeSouth Atlantic
23.292880e+09NoNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNYesMale18-29$0 - $24,999Bachelor degreeWest South Central
33.292765e+09YesNoStar Wars: Episode I The Phantom MenaceStar Wars: Episode II Attack of the ClonesStar Wars: Episode III Revenge of the SithNaNNaNNaN1...Unfamiliar (N/A)I don't understand this questionNoNaNNoMale18-29$0 - $24,999High school degreeWest North Central
43.292763e+09YesYesStar Wars: Episode I The Phantom MenaceStar Wars: Episode II Attack of the ClonesStar Wars: Episode III Revenge of the SithStar Wars: Episode IV A New HopeStar Wars: Episode V The Empire Strikes BackStar Wars: Episode VI Return of the Jedi5...Very favorablyI don't understand this questionNoNaNYesMale18-29$100,000 - $149,999Some college or Associate degreeWest North Central
53.292731e+09YesYesStar Wars: Episode I The Phantom MenaceStar Wars: Episode II Attack of the ClonesStar Wars: Episode III Revenge of the SithStar Wars: Episode IV A New HopeStar Wars: Episode V The Empire Strikes BackStar Wars: Episode VI Return of the Jedi5...Somewhat favorablyGreedoYesNoNoMale18-29$100,000 - $149,999Some college or Associate degreeWest North Central
\n", "

5 rows × 38 columns

\n", "
" ], "text/plain": [ " RespondentID Have you seen any of the 6 films in the Star Wars franchise? \\\n", "1 3.292880e+09 Yes \n", "2 3.292880e+09 No \n", "3 3.292765e+09 Yes \n", "4 3.292763e+09 Yes \n", "5 3.292731e+09 Yes \n", "\n", " Do you consider yourself to be a fan of the Star Wars film franchise? \\\n", "1 Yes \n", "2 NaN \n", "3 No \n", "4 Yes \n", "5 Yes \n", "\n", " Which of the following Star Wars films have you seen? Please select all that apply. \\\n", "1 Star Wars: Episode I The Phantom Menace \n", "2 NaN \n", "3 Star Wars: Episode I The Phantom Menace \n", "4 Star Wars: Episode I The Phantom Menace \n", "5 Star Wars: Episode I The Phantom Menace \n", "\n", " Unnamed: 4 \\\n", "1 Star Wars: Episode II Attack of the Clones \n", "2 NaN \n", "3 Star Wars: Episode II Attack of the Clones \n", "4 Star Wars: Episode II Attack of the Clones \n", "5 Star Wars: Episode II Attack of the Clones \n", "\n", " Unnamed: 5 \\\n", "1 Star Wars: Episode III Revenge of the Sith \n", "2 NaN \n", "3 Star Wars: Episode III Revenge of the Sith \n", "4 Star Wars: Episode III Revenge of the Sith \n", "5 Star Wars: Episode III Revenge of the Sith \n", "\n", " Unnamed: 6 \\\n", "1 Star Wars: Episode IV A New Hope \n", "2 NaN \n", "3 NaN \n", "4 Star Wars: Episode IV A New Hope \n", "5 Star Wars: Episode IV A New Hope \n", "\n", " Unnamed: 7 \\\n", "1 Star Wars: Episode V The Empire Strikes Back \n", "2 NaN \n", "3 NaN \n", "4 Star Wars: Episode V The Empire Strikes Back \n", "5 Star Wars: Episode V The Empire Strikes Back \n", "\n", " Unnamed: 8 \\\n", "1 Star Wars: Episode VI Return of the Jedi \n", "2 NaN \n", "3 NaN \n", "4 Star Wars: Episode VI Return of the Jedi \n", "5 Star Wars: Episode VI Return of the Jedi \n", "\n", " Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. \\\n", "1 3 \n", "2 NaN \n", "3 1 \n", "4 5 \n", "5 5 \n", "\n", " ... Unnamed: 28 \\\n", "1 ... Very favorably \n", "2 ... NaN \n", "3 ... Unfamiliar (N/A) \n", "4 ... Very favorably \n", "5 ... Somewhat favorably \n", "\n", " Which character shot first? \\\n", "1 I don't understand this question \n", "2 NaN \n", "3 I don't understand this question \n", "4 I don't understand this question \n", "5 Greedo \n", "\n", " Are you familiar with the Expanded Universe? \\\n", "1 Yes \n", "2 NaN \n", "3 No \n", "4 No \n", "5 Yes \n", "\n", " Do you consider yourself to be a fan of the Expanded Universe?ξ \\\n", "1 No \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "5 No \n", "\n", " Do you consider yourself to be a fan of the Star Trek franchise? Gender \\\n", "1 No Male \n", "2 Yes Male \n", "3 No Male \n", "4 Yes Male \n", "5 No Male \n", "\n", " Age Household Income Education \\\n", "1 18-29 NaN High school degree \n", "2 18-29 $0 - $24,999 Bachelor degree \n", "3 18-29 $0 - $24,999 High school degree \n", "4 18-29 $100,000 - $149,999 Some college or Associate degree \n", "5 18-29 $100,000 - $149,999 Some college or Associate degree \n", "\n", " Location (Census Region) \n", "1 South Atlantic \n", "2 West South Central \n", "3 West North Central \n", "4 West North Central \n", "5 West North Central \n", "\n", "[5 rows x 38 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "star_wars = star_wars[pd.notnull(star_wars['RespondentID'])]\n", "star_wars.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning and Mapping Yes/No Columns\n", "The following columns represent Yes/No questions. They can also be NaN where a respondent chooses not to answer a question:\n", "\n", "- Have you seen any of the 6 films in the Star Wars franchise?\n", "- Do you consider yourself to be a fan of the Star Wars film franchise?\n", "\n", "We will convert each column to a Boolean having only the values True, False, and NaN in order to make the data a bit easier to analyze." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True 936\n", "False 250\n", "Name: Have you seen any of the 6 films in the Star Wars franchise?, dtype: int64\n", "True 552\n", "NaN 350\n", "False 284\n", "Name: Do you consider yourself to be a fan of the Star Wars film franchise?, dtype: int64\n" ] } ], "source": [ "yes_no = {\"Yes\": True, \"No\": False}\n", "columns_to_boolean = ['Have you seen any of the 6 films in the Star Wars franchise?', 'Do you consider yourself to be a fan of the Star Wars film franchise?']\n", "for column in columns_to_boolean:\n", " star_wars[column] = star_wars[column].map(yes_no)\n", " print(star_wars[column].value_counts(dropna=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning and Mapping Checkbox Columns\n", "The columns 3 to 9 represent a single checkbox question. The respondent checked off a series of boxes in response to the question, Which of the following Star Wars films have you seen? Please select all that apply.\n", "\n", "For each of these columns, if the value in a cell is the name of the movie, that means the respondent saw the movie. If the value is NaN, the respondent either didn't answer or didn't see the movie. We'll assume that they didn't see the movie." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "movies_map = {\n", " \"Star Wars: Episode I The Phantom Menace\": True,\n", " np.NaN: False,\n", " \"Star Wars: Episode II Attack of the Clones\": True,\n", " \"Star Wars: Episode III Revenge of the Sith\": True,\n", " \"Star Wars: Episode IV A New Hope\": True,\n", " \"Star Wars: Episode V The Empire Strikes Back\": True,\n", " \"Star Wars: Episode VI Return of the Jedi\": True \n", "}\n", "\n", "for col in star_wars.columns[3:9]:\n", " star_wars[col] = star_wars[col].map(movies_map)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll need to rename the columns to better reflect what they represent" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "star_wars = star_wars.rename(columns={\n", " \"Which of the following Star Wars films have you seen? Please select all that apply.\": \"seen_1\",\n", " \"Unnamed: 4\": \"seen_2\",\n", " \"Unnamed: 5\": \"seen_3\",\n", " \"Unnamed: 6\": \"seen_4\",\n", " \"Unnamed: 7\": \"seen_5\",\n", " \"Unnamed: 8\": \"seen_6\",\n", " \n", "})" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True 673\n", "False 513\n", "Name: seen_1, dtype: int64\n", "False 615\n", "True 571\n", "Name: seen_2, dtype: int64\n", "False 636\n", "True 550\n", "Name: seen_3, dtype: int64\n", "True 607\n", "False 579\n", "Name: seen_4, dtype: int64\n", "True 758\n", "False 428\n", "Name: seen_5, dtype: int64\n", "True 738\n", "False 448\n", "Name: seen_6, dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
RespondentIDHave you seen any of the 6 films in the Star Wars franchise?Do you consider yourself to be a fan of the Star Wars film franchise?seen_1seen_2seen_3seen_4seen_5seen_6Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film....Unnamed: 28Which character shot first?Are you familiar with the Expanded Universe?Do you consider yourself to be a fan of the Expanded Universe?ξDo you consider yourself to be a fan of the Star Trek franchise?GenderAgeHousehold IncomeEducationLocation (Census Region)
13.292880e+09TrueTrueTrueTrueTrueTrueTrueTrue3...Very favorablyI don't understand this questionYesNoNoMale18-29NaNHigh school degreeSouth Atlantic
23.292880e+09FalseNaNFalseFalseFalseFalseFalseFalseNaN...NaNNaNNaNNaNYesMale18-29$0 - $24,999Bachelor degreeWest South Central
33.292765e+09TrueFalseTrueTrueTrueFalseFalseFalse1...Unfamiliar (N/A)I don't understand this questionNoNaNNoMale18-29$0 - $24,999High school degreeWest North Central
\n", "

3 rows × 38 columns

\n", "
" ], "text/plain": [ " RespondentID Have you seen any of the 6 films in the Star Wars franchise? \\\n", "1 3.292880e+09 True \n", "2 3.292880e+09 False \n", "3 3.292765e+09 True \n", "\n", " Do you consider yourself to be a fan of the Star Wars film franchise? \\\n", "1 True \n", "2 NaN \n", "3 False \n", "\n", " seen_1 seen_2 seen_3 seen_4 seen_5 seen_6 \\\n", "1 True True True True True True \n", "2 False False False False False False \n", "3 True True True False False False \n", "\n", " Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. \\\n", "1 3 \n", "2 NaN \n", "3 1 \n", "\n", " ... Unnamed: 28 \\\n", "1 ... Very favorably \n", "2 ... NaN \n", "3 ... Unfamiliar (N/A) \n", "\n", " Which character shot first? \\\n", "1 I don't understand this question \n", "2 NaN \n", "3 I don't understand this question \n", "\n", " Are you familiar with the Expanded Universe? \\\n", "1 Yes \n", "2 NaN \n", "3 No \n", "\n", " Do you consider yourself to be a fan of the Expanded Universe?ξ \\\n", "1 No \n", "2 NaN \n", "3 NaN \n", "\n", " Do you consider yourself to be a fan of the Star Trek franchise? Gender \\\n", "1 No Male \n", "2 Yes Male \n", "3 No Male \n", "\n", " Age Household Income Education Location (Census Region) \n", "1 18-29 NaN High school degree South Atlantic \n", "2 18-29 $0 - $24,999 Bachelor degree West South Central \n", "3 18-29 $0 - $24,999 High school degree West North Central \n", "\n", "[3 rows x 38 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "for col in star_wars.columns[3:9]:\n", " print(star_wars[col].value_counts(dropna=False))\n", "star_wars.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning the Ranking Columns\n", "\n", "The next six columns (from 9 to 15) ask the respondent to rank the Star Wars movies in order of least favorite to most favorite. 1 means the film was the most favorite, and 6 means it was the least favorite. We'll need to convert each column to a numeric type, though, then rename the columns so that we can tell what they represent more easily." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['RespondentID',\n", " 'Have you seen any of the 6 films in the Star Wars franchise?',\n", " 'Do you consider yourself to be a fan of the Star Wars film franchise?',\n", " 'seen_1', 'seen_2', 'seen_3', 'seen_4', 'seen_5', 'seen_6', 'ranking_1',\n", " 'ranking_2', 'ranking_3', 'ranking_4', 'ranking_5', 'ranking_6',\n", " 'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.',\n", " 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19',\n", " 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23',\n", " 'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',\n", " 'Unnamed: 28', 'Which character shot first?',\n", " 'Are you familiar with the Expanded Universe?',\n", " 'Do you consider yourself to be a fan of the Expanded Universe?ξ',\n", " 'Do you consider yourself to be a fan of the Star Trek franchise?',\n", " 'Gender', 'Age', 'Household Income', 'Education',\n", " 'Location (Census Region)'],\n", " dtype='object')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert each column to a numeric type\n", "star_wars[star_wars.columns[9:15]] = star_wars[star_wars.columns[9:15]].astype(float)\n", "\n", "# Rename the columns\n", "star_wars = star_wars.rename(columns={\n", " \"Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.\": \"ranking_1\",\n", " \"Unnamed: 10\": \"ranking_2\",\n", " \"Unnamed: 11\": \"ranking_3\",\n", " \"Unnamed: 12\": \"ranking_4\",\n", " \"Unnamed: 13\": \"ranking_5\",\n", " \"Unnamed: 14\": \"ranking_6\"\n", " })\n", "\n", "star_wars.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysing the dataset\n", "\n", "#### Finding the highest-ranked movie" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ranking_1 3.732934\n", "ranking_2 4.087321\n", "ranking_3 4.341317\n", "ranking_4 3.272727\n", "ranking_5 2.513158\n", "ranking_6 3.047847\n", "dtype: float64\n" ] }, { "data": { "text/plain": [ "Text(0.5,0,'Rankings')" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "mean_ranking = star_wars.loc[:,'ranking_1':'ranking_6'].mean()\n", "print(mean_ranking)\n", "\n", "mean_ranking.plot.bar()\n", "plt.title('Average ranking for each movie')\n", "plt.ylabel('Average Ranking')\n", "plt.xlabel('Rankings')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see how the best movie ranked by the respondents is the fifth one. In general, the first trilogy is better ranked than the second one.\n", "\n", "#### Finding the most viewed movie\n", "We will figure out how many people have seen each movie just by taking the sum of each column related with." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5,0,'Seen columns by movie')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEwCAYAAACt2uY+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xm8HFWd9/HPlyQssoWQC8QsBCHIMgKGwICgIjAOoEPQh7CIEBANDIjyoI5RVJx51AecYXBwnGgETVBBFkGiIgiBAOqwJAHZAhKykJgACUsIRIGQ3/xR55qaTt17O8mtru7O9/169aurTp2q+p3ue/vXp051lSICMzOzWhtVHYCZmTUnJwgzMyvkBGFmZoWcIMzMrJAThJmZFXKCMDOzQk4QLU7SppJC0pB1WHcjST+R9JKku8qIb11IulDSZb20rd0kreyNbXWx/TMl3VbW9jd0kp6RdHDVceRJOlzSH6qOoxGcIEog6ZXcY5WkP+fmT+ph3SMkzW5QqIcBBwKDIuI9DdpnqXrzA2V9kq+1r4i4LSL2rjqORuhbdQDtKCK26JyWNA/4eEQ047fMHYE5EfHnqgMxs+bjHkQFJG0m6TuSFktaKOlfJfWTtC1wA/C2XI9jW0kHSbpX0jJJiyRdIqmu5C5pmKSbJL0g6Y+Sxqbys4D/BA5J+/liwbpnSrpd0vckvSzpMUnvyS0fIOmK9K19gaQLJG2UlvWR9M+Snpb0rKQfSNoyLdtN0sq0/cWpTed004Z3p/a/JGmmpIO6qHctsB3wm9SmT+WWnZZe6yWSPpcr7+617Tzs9kTa3jFdh6hLU3xPSTo8t+AMSY9LWi5ptqSP5ZbNqam7SYpjj7Vpd6r75fRavixplqR3p/I+adkcSUvTIcX+9by2ku5J7+k9abs3Sdqmmxg+JOmhtK27O9uRln1F0tz0Ojwi6QM1656Ve50elvSO3OL90jrLUvwbd7H/zr/X/0x1n5Q0StI4SX9Kf4cn5OoPkHRl+puYK+mflNk8vd+75OoOVnYkYBvV9PIlDZV0Y3p950g6s6vXqOVEhB8lPoB5wOE1Zd8E7gYGAtsD9wPnp2VHALNr6u8P7Af0AXYGZgNnpmWbAgEM6WL/9wKXAJsAo4AXgIPSsjOB27qJ/UxgJXAW0A84Ja2/VVr+a+DbwFuAQcADwNi07CxgFlkvZSvgl8D307LdUsyTgc2Ad6btHpyWXwhclqaHA88Dh5N9oTkKWAJs00XMz3Rup2Zf30mv1X7A68Db1ve1zb1Gb6TXpg/wf4F5ueVHAzsBSm34M7BnWvYN4PJc3f8DPLi27Qb2BuakvyUBbwN2SsvGk/2tvTW1ZxLww3r2AdwDPJFel82B3wNf7eJ1OABYDOybXodxwB+Bvmn58elvZCPgZGA5MDAtOxmYn/4OBLy98zVP7+fvUts60vtzag/vxUfIjo78a9ruJcDG6b14Edg01b8GuBbYAtgFmAuclJZdCXw5t+3PAD+v/R9NbX0Y+Hzax67A08B7q/7s6ZXPr6oDaPcHxQniT8ChufnRwONpeo0EUbDN8cBVabrLDzFgBPAXYLNc2SXAd9N0PQlibk3ZQ8AYsg/+V4F+uWWnAb9O078DPpZbtjewIn0AdH5oD88tvxT4TprOJ4gLSIklV/dO4PguYu4qQQysacMx6/Pa1rxGj+TmB6R1+ndR/2bgjDS9Z/rA2jjN/wz4p7Vtd9rOYuB9pA/k3LK5pC8EaX6n3PvQ7T7IEsRnc8vOI31IFsTwQ9KXnFzZfOBvu6j/OPD3uX2e0c37eWzN38m3unkvHs7N75fei61zZa+mv4lNgDdJXxTSsk8DN6fpDwKP5ZbNAI6r/R8F3gs8WRPHPwMTuvsfbpWHxyAaTJKAHcj+eTrNBwZ3s84ewMXASLJv3H3JPoB78lZgSfzvMYb5ZIPT9VpYMz8/bXdHsg/QJVmTgOzbYWfX+62s2cbNyD5AOy2oWV40uLwjcKKkMbmyfmn79XozIpbm5leQfWtcn9c275mabZO2/5Kko4EvkX1D3Yist3U3QEQ8KmkBcKSkO4AjgXPT+nW3O21nPPB1YDdJvyb7MH8OGArcJCl/Vc6NgG3r3Edt27ag2I7AcfnDd2TfqAcDSDqd7AN4WFq2BVkPmhTjU11styiGgV1VBJ7NTf8ZeC0iltWUbUH2P7gR2bf9Tvn/w1uAyZL2TuuMAH5RsL8dgeGSXsqV9QGaccxxrTlBNFhEhKRnyP6wOv8phpH1KiD7xlPr+8A0YExEvJI+DA4vqFdrEdAhabNcksjvqx61Z/AMS9tdALxCdjiiKOZFZG3Mr/dnskNJHalsKFkPK7/dWgvIehNdjlHUWNvLE3f32q7XpY4lbU52CONYsp7VSkk3k31773QVcCJZ4rw/IjqT5lq1OyImk32g9QcuB74WEZ+Q9CfgwxExoyC+tX1tu7MA+FVEXFywn13JDkUeCtwXEaskPc7q12EB2WGsRn6oPgOsIvu7m5PK/vq/ERFvSLqO7L15Dbghik/mWEDW+39HwbKW50HqalwFXKBsAHo74Hzgx2nZs8B2kvLf1LYElqUPsD2BT9S5n9lkh1O+lgZARwJjgZ+sRaxD0+BfX0kfJfsn+k1EzCU7BPFNSVsq+03FCK0+xfQq4LPKBsm3BL4GXFmTTC5QNmC/N9lx6KsL9j8ZGCPpsDTgulma3qGLeJ8lOwZfry5f24h4DVi2ltvL24zsG/lzwKrUmzikps5VZIczPk523LtT3e2WtIek90rahCwJ/5ns8AnAd4ELJQ1NdbeT9A9ru486TATOSYPCkrSFpKMlvYXsG/sqsvGNjdIg7i65dS8DxkvaO627q0o+tTi9tzcA30iD0juT9XB+nKt2JXACWZK4cs2tAPBbAEnnKjstuq+kvdL/WstzgqjGV4DHgEeBB8kOaXwzLfsDMAWYr+xskAFkA58fl/QK2WBr0QfpGtKH8XHAHmTfmK4GPhcRd69FrHexehD5fOBDuS77iUB/suPJL6Ttb5+WTQCuJxvYfCotPy+33TfJBtDnkh2X/5eIWOPHehExh2zw9p+BpWSHAT5N13+7Xwe+nl67T9bRvp5e268A16btHV3H9vKxLwU+S3Zo4nngGOCmmjrzyN7z/ch6G53la9PuzcgOky0lG4vYIsUN2d/VbcDtkpaTvR8j12EfPbX1d8CngO8BL5ENUH8kWxQzyRLV9BTfTmm6c90fAf8OXAe8nJ77U74z0vN84HayRJX/8nQX2eGiremidxMRb5AN7r8rbWcJ2d9+V4fiWoqKjw6YZacNkg0Q1nM4a222uxvZwK4PcZo1MfcgzMyskBOEmZkV8iEmMzMr5B6EmZkVcoIwM7NCLX0WycCBA2P48OFVh2Fm1lJmzJixNCI6eqrX0gli+PDhTJ8+veeKZmb2V5Lm91zLh5jMzKwLThBmZlbICcLMzAo5QZiZWSEnCDMzK+QEYWZmhZwgzMyskBOEmZkVaukfyplZa/nOmbc3dH9nf/fQhu6v3bgHYWZmhZwgzMyskBOEmZkVcoIwM7NCThBmZlbIZzGZmfWSi4//YEP395mrf1nq9t2DMDOzQk4QZmZWyAnCzMwKOUGYmVkhJwgzMytUWoKQ9HZJD+YeL0s6V9IASbdKejI9b5PqS9KlkmZLekjSyLJiMzOznpWWICLiiYjYJyL2AfYFVgA3AOOBqRExApia5gGOBEakxzhgQlmxmZlZzxp1iOkw4KmImA+MBian8snAMWl6NHBFZO4B+ksa1KD4zMysRqMSxAnAVWl6+4hYDJCet0vlg4EFuXUWpjIzM6tA6QlC0sbA0cC1PVUtKIuC7Y2TNF3S9CVLlvRGiGZmVqARPYgjgZkR8Wyaf7bz0FF6fi6VLwSG5tYbAiyq3VhETIyIURExqqOjo8Swzcw2bI1IECey+vASwBRgbJoeC9yYKz8lnc10ALCs81CUmZk1XqkX65P0FuDvgDNyxRcC10g6HXgaGJPKbwKOAmaTnfF0WpmxmZlZ90pNEBGxAti2pux5srOaausGcHaZ8ZiZWf38S2ozMyvkBGFmZoWcIMzMrNAGeUe54eN/1dD9zbvwAw3dn5lZb3APwszMCjlBmJlZIScIMzMr5ARhZmaFnCDMzKyQE4SZmRVygjAzs0JOEGZmVsgJwszMCjlBmJlZIScIMzMr5ARhZmaFnCDMzKyQE4SZmRVygjAzs0KlJghJ/SVdJ+lxSbMkHShpgKRbJT2ZnrdJdSXpUkmzJT0kaWSZsZmZWffK7kH8B3BzROwG7A3MAsYDUyNiBDA1zQMcCYxIj3HAhJJjMzOzbpR2RzlJWwHvAU4FiIjXgdcljQYOSdUmA9OAzwOjgSsiIoB7Uu9jUEQsLitGs2Yza7fdG7q/3R+f1dD9WWspswfxNmAJ8ENJD0i6TNLmwPadH/rpebtUfzCwILf+wlRmZmYVKDNB9AVGAhMi4p3Aq6w+nFREBWWxRiVpnKTpkqYvWbKkdyI1M7M1lJkgFgILI+LeNH8dWcJ4VtIggPT8XK7+0Nz6Q4BFtRuNiIkRMSoiRnV0dJQWvJnZhq60BBERzwALJL09FR0GPAZMAcamsrHAjWl6CnBKOpvpAGCZxx/MzKpT2iB1cg7wE0kbA3OA08iS0jWSTgeeBsakujcBRwGzgRWprpmZVaTUBBERDwKjChYdVlA3gLPLjMfMzOrnX1KbmVkhJwgzMyvkBGFmZoXKHqS2Knx16wbvb1lj92dmDeEehJmZFXKCMDOzQk4QZmZWyAnCzMwKeZDaWs47Jr+joft7eOzDDd2fWbNwD8LMzAo5QZiZWaG6EoSkgyWdlqY7JO1UblhmZla1HhOEpAvIbgn6hVTUD/hxmUGZmVn16ulBfAg4muyOcETEImDLMoMyM7Pq1ZMgXk+X4g6AdF9pMzNrc/UkiGskfQ/oL+kTwG3A98sNy8zMqtbj7yAi4t8k/R3wMvB24CsRcWvpkZmZWaXq+qFcSghOCmZmG5AuE4Sk5aRxh9pFZHcI3aqnjUuaBywH3gRWRsQoSQOAq4HhwDzguIh4UZKA/yC7L/UK4NSImLlWrTEzs17T5RhERGwZEVsVPLasJznkvC8i9omIzntTjwemRsQIYGqaBzgSGJEe44AJa98cMzPrLXUdYpI0EjiYrEfx24h4YD32ORo4JE1PBqaR/c5iNHBFOmPqHkn9JQ2KiMXrsS8zM1tH9fxQ7itkH+TbAgOBSZK+VOf2A/iNpBmSxqWy7Ts/9NPzdql8MLAgt+7CVGZmZhWopwdxIvDOiPgLgKQLgZnA1+pY96CIWCRpO+BWSY93U1cFZWuMgaREMw5g2LBhdYRgZmbrop7fQcwDNs3NbwI8Vc/G06+uiYjngBuA/YFnJQ0CSM/PpeoLgaG51YcAiwq2OTEiRkXEqI6OjnrCMDOzdVBPgngNeFTSJEk/BB4BXpF0qaRLu1pJ0uaStuycBt6f1p0CjE3VxgI3pukpwCnKHAAs8/iDmVl16jnEdEN6dJpW57a3B27Izl6lL3BlRNws6X6yX2efDjwNjEn1byI7xXU22Wmup9W5HzMzK0E9v6SeLGljYNdU9EREvFHHenOAvQvKnwcOKygP4OweIzYzs4boMUFIOoTsLKZ5ZAPJQyWNjYi7yg3NzMyqVM8hpouB90fEEwCSdgWuAvYtMzAzM6tWPYPU/TqTA0BE/JHspkFmZtbG6ulBTJd0OfCjNH8SMKO8kMzMrBnUkyD+kWzw+FNkYxB3Af9VZlBmZla9es5iek3SBOBX+UNNZmbW3uq5FtPRwIPAzWl+H0lTyg7MzMyqVc8g9QVkl8h4CSAiHiS7l4OZmbWxehLEyohYVnokZmbWVOoZpH5E0keAPpJGkA1W/77csMzMrGr19CDOAfYku2jfVcDLwLllBmVmZtWr5yymFcD5ki7KZmN5+WGZmVnV6jmLaT9JDwMPAQ9L+oMkX2bDzKzN1TMGcTlwVkTcDSDpYOCHwF5lBmZmZtWqZwxieWdyAIiI3wI+zGRm1ubq6UHcJ+l7ZAPUARwPTJM0EiAiZpYYn5mZVaSeBLFPer6gpvxdZAnj0F6NyMzMmkI9ZzG9rxGBmJlZc6lnDGK9SOoj6QFJv0zzO0m6V9KTkq5OtzNF0iZpfnZaPrzs2MzMrGulJwjg08Cs3PxFwCURMQJ4ETg9lZ8OvBgRuwCXpHpmZlaRLhOEpDHpead13bikIcAHgMvSvMjGLK5LVSYDx6Tp0WmetPywVN/MzCrQXQ/iC+n5Z+ux/W8B/wSsSvPbAi9FxMo0vxAYnKYHAwsA0vJlqb6ZmVWgu0Hq5yXdAexUdP+HiDi6uw1L+iDwXETMkHRIZ3FB1ahjWX6744BxAMOGDesuBDMzWw/dJYgPACPJ7kV98Tps+yDgaElHAZsCW5H1KPpL6pt6CUOARan+QmAosFBSX2Br4IXajUbERGAiwKhRo9ZIIGZm1ju6PMQUEa9HxD3AuyLiTmAmMCMi7kzz3YqIL0TEkIgYDpwA3B4RJwF3AMemamOBG9P0lDRPWn57RDgBmJlVpJ6zmLaX9ADwCPCYpBmS/mY99vl54DxJs8nGGC5P5ZcD26by84Dx67EPMzNbT/X8knoicF5E3AGQxhMmkv2Sui4RMQ2YlqbnkN3CtLbOX4Ax9W7TzMzKVU8PYvPO5AB//bDfvLSIzMysKdTTg5gj6ctkg9UAHwXmlheSmZk1g3p6EB8DOoDr02MgcFqZQZmZWfXquVjfi8CnGhCLmZk1kUZci8nMzFqQE4SZmRXqMUFIOqieMjMzay/19CC+XWeZmZm1kS4HqSUdSPZjuA5J5+UWbQX0KTswMzOrVndnMW0MbJHqbJkrf5nV11IyM7M21WWCSBfku1PSpIiY38CYzMysCdTzS+pNJE0EhufrR8ShZQVlZmbVqydBXAt8l+y2oW+WG46ZmTWLehLEyoiYUHokZmbWVOo5zfUXks6SNEjSgM5H6ZGZmVml6ulBdN7l7XO5sgDe1vvhmJlZs6jnYn07NSIQMzNrLj0mCEmnFJVHxBW9H46ZmTWLeg4x7Zeb3hQ4DJgJOEGYmbWxeg4xnZOfl7Q1q+8u1yVJmwJ3AZuk/VwXERdI2gn4KTCALNGcHBGvS9qELOnsCzwPHB8R89auOWZm1lvW5XLfK4ARddR7DTg0IvYG9gGOkHQAcBFwSUSMAF4ETk/1TwdejIhdgEtSPTMzq0g9l/v+haQp6fEr4Angxp7Wi8wrabZfegRwKHBdKp8MHJOmR6d50vLDJKnulpiZWa+qZwzi33LTK4H5EbGwno1L6gPMAHYBvgM8BbwUEStTlYXA4DQ9GFgAEBErJS0DtgWW1rMvMzPrXT32INJF+x4nu6LrNsDr9W48It6MiH2AIcD+wO5F1dJzUW8hagskjZM0XdL0JUuW1BuKmZmtpXoOMR0H3AeMAY4D7pW0Vpf7joiXgGnAAUB/SZ09lyHAojS9EBia9tkX2Bp4oWBbEyNiVESM6ujoWJswzMxsLdQzSH0+sF9EjI2IU8h6Al/uaSVJHZL6p+nNgMOBWcAdrL6fxFhWj2dMYfWvto8Fbo+INXoQZmbWGPWMQWwUEc/l5p+nvsQyCJicxiE2Aq6JiF9Kegz4qaSvAQ8Al6f6lwM/kjSbrOdwQr2NMDOz3ldPgrhZ0i3AVWn+eODXPa0UEQ8B7ywon0PWC6kt/wvZYSwzM2sC9fxQ7nOSPgwcTDaQPDEibig9MjMzq1SXCULSLsD2EfG7iLgeuD6Vv0fSzhHxVKOCNDOzxutuLOFbwPKC8hVpmZmZtbHuEsTwNI7wv0TEdLL7U5uZWRvrLkFs2s2yzXo7EDMzay7dJYj7JX2itlDS6WSXzzAzszbW3VlM5wI3SDqJ1QlhFLAx8KGyAzMzs2p1mSAi4lngXZLeB/xNKv5VRNzekMjMzKxS9fwO4g6yy2OYmdkGZF1uGGRmZhsAJwgzMyvkBGFmZoWcIMzMrJAThJmZFXKCMDOzQk4QZmZWyAnCzMwKOUGYmVmh0hKEpKGS7pA0S9Kjkj6dygdIulXSk+l5m1QuSZdKmi3pIUkjy4rNzMx6VmYPYiXwmYjYHTgAOFvSHsB4YGpEjACmpnmAI4ER6TEOmFBibGZm1oPSEkRELI6ImWl6OTALGAyMBianapOBY9L0aOCKyNwD9Jc0qKz4zMysew0Zg5A0HHgncC/Zfa4XQ5ZEgO1StcHAgtxqC1OZmZlVoPQEIWkL4GfAuRHxcndVC8qiYHvjJE2XNH3JkiW9FaaZmdUoNUFI6keWHH4SEden4mc7Dx2l5+dS+UJgaG71IcCi2m1GxMSIGBURozo6OsoL3sxsA1fmWUwCLgdmRcS/5xZNAcam6bHAjbnyU9LZTAcAyzoPRZmZWeP1eMOg9XAQcDLwsKQHU9kXgQuBa9K9rZ8GxqRlNwFHAbOBFcBpJcZmZmY9KC1BRMRvKR5XADisoH4AZ5cVj5mZrR3/ktrMzAo5QZiZWSEnCDMzK+QEYWZmhZwgzMyskBOEmZkVcoIwM7NCThBmZlbICcLMzAo5QZiZWSEnCDMzK+QEYWZmhZwgzMyskBOEmZkVcoIwM7NCThBmZlbICcLMzAo5QZiZWaHSEoSkH0h6TtIjubIBkm6V9GR63iaVS9KlkmZLekjSyLLiMjOz+pTZg5gEHFFTNh6YGhEjgKlpHuBIYER6jAMmlBiXmZnVobQEERF3AS/UFI8GJqfpycAxufIrInMP0F/SoLJiMzOznjV6DGL7iFgMkJ63S+WDgQW5egtTmZmZVaRZBqlVUBaFFaVxkqZLmr5kyZKSwzIz23A1OkE823noKD0/l8oXAkNz9YYAi4o2EBETI2JURIzq6OgoNVgzsw1ZoxPEFGBsmh4L3JgrPyWdzXQAsKzzUJSZmVWjb1kblnQVcAgwUNJC4ALgQuAaSacDTwNjUvWbgKOA2cAK4LSy4jIzs/qUliAi4sQuFh1WUDeAs8uKxczM1l6zDFKbmVmTcYIwM7NCThBmZlbICcLMzAo5QZiZWSEnCDMzK+QEYWZmhZwgzMyskBOEmZkVcoIwM7NCThBmZlbICcLMzAo5QZiZWSEnCDMzK+QEYWZmhZwgzMyskBOEmZkVcoIwM7NCThBmZlaoqRKEpCMkPSFptqTxVcdjZrYha5oEIakP8B3gSGAP4ERJe1QblZnZhqtpEgSwPzA7IuZExOvAT4HRFcdkZrbBaqYEMRhYkJtfmMrMzKwCioiqYwBA0hjg7yPi42n+ZGD/iDinpt44YFyafTvwRAPDHAgsbeD+Gs3ta13t3DZw+3rbjhHR0VOlvo2IpE4LgaG5+SHAotpKETERmNiooPIkTY+IUVXsuxHcvtbVzm0Dt68qzXSI6X5ghKSdJG0MnABMqTgmM7MNVtP0ICJipaRPArcAfYAfRMSjFYdlZrbBapoEARARNwE3VR1HNyo5tNVAbl/raue2gdtXiaYZpDYzs+bSTGMQZmbWRJwgzMyskBOEmZkVcoJYB5L+ruoYeoOkrSTtXFC+VxXx9DZJO0jaIU13SPqwpD2rjqsMkr5RdQxlSae+f1jSblXH0hskDZO0aZqWpNMkfVvSP0pqqhOHPEi9DiQ9HRHDqo5jfUg6DvgW8BzQDzg1Iu5Py2ZGxMgq41tfks4AxgMCLgJOBR4FDgK+GRGXVxfd+pF0aW0RcDJwBUBEfKrhQfUiST+PiGPS9Giyv9NpwLuA/x8Rk6qLbv1JeoTsKhErJF0E7Az8HDgUICI+VmV8eU2VrZqJpK5+pCdg20bGUpIvAvtGxGJJ+wM/kvTFiLierI2t7pPAnsBmwHxgl4h4RtI2wB1AyyYI4MNkH5i/YfV7dQIwo6qAetmOuenPA4dGxFxJA4GpwKRKouo9G0XEijR9OLBfRKwCfizpDxXGtQYniK69G/go8EpNuciuPNvq+kTEYoCIuE/S+4BfShoCtEO38o30T7hC0lMR8QxARLwoqdXbtzvw/4AjgM9FxJ8kXRARkyuOq7fk35++ETEXICKWSlpVUUy9aYGkQyPidmAe2SWG5ktqui+eThBduwdYERF31i6Q1MgLBJZluaSdI+IpgNSTOISsq9sOx+lXSeoXEW8AH+gsTMd+W3rsLSKWA+dK2pfsW+evaPE21dhb0stkX8Y2kbRD6v1tTHaVhVb3ceAKSV8FlgEPSnoA2AY4r8rAankMYgMlaW/g1YiYXVPeDzguIn5STWS9Q9IwYFFErKwpHwzsHhG3VRNZ75Ik4CzgwIj4aNXxlElSf7L37r+rjqU3SNod2JXsi/pC4P50qKlpOEGsJ0n/HREHVh1HWdy+1tXObQO3rxHaqVtalU2rDqBkbl/raue2gdtXOieI9dfuXTC3r3W1c9vA7SudE4SZmRVyglh/7fCbge64fa2rndsGbl/pnCDW38lVB1Ayt691tXPbwO0rnRNED9I1YJ6UtEzSy5KWp3O0AYiIR6qMb325fa3bvnZuG7h9zdA+n+baA0mzgX+IiFlVx1IGt691tXPbwO1rBu5B9OzZZn4De4Hb17rauW3g9lXOPYgeSPoPYAeyS1C81lmeLmrX8ty+1tXObQO3rxn4Wkw92wpYAbw/VxZA07yJ68nta13t3DZw+yrnHoSZmRXyGEQPJO0qaWq6yQeS9pL0parj6i1uX+tq57aB29cMnCB69n3gC8AbABHxENnNWdqF29e62rlt4PZVzgmiZ2+JiPtqylYW1mxNbl/raue2gdtXOSeIni2VtDPpwlmSjgUWVxtSr3L7Wlc7tw3cvsp5kLoHkt4GTCS7YfqLwFzgoxExr8q4eovb17rauW3g9jUDJ4g6Sdqc7Gbjy6uOpQxuX+tq57aB21clH2LqgaTtJV0OXBcRyyXtIen0quPqLW5f62rntoHb1wycIHo2CbgFeGua/yNwbmXR9L5JuH2tahLt2zZw+yrnBNGzgRFxDbAKICJWAm9WG1KvcvtaVzu3Ddy+yjlB9OxVSduy+kyDA4Bl1YbUq9y+1tXObQO3r3K+FlPPzgOmADtL+h3QARxbbUi9yu1rXe3cNnD7KuceRM92Bo4kOxXtFuBJ2iuxun2tq50i9KFQAAAFIElEQVTbBm5f5ZwgevbliHgZ2AY4nOy85QnVhtSr3L7W1c5tA7evck4QPescNPoA8N2IuBHYuMJ4epvb17rauW3g9lXOCaJnf5L0PeA44CZJm9Ber5vb17rauW3g9lXOv6TugaS3AEcAD0fEk5IGAe+IiN9UHFqvcPtaVzu3Ddy+ZuAEYWZmhZqqO2NmZs3DCcLMzAo5QVhDSTpf0qOSHpL0oKS/rTqmTpK+KumzTRDHK1XHkCfprZKuqzoOa7ym+lGGtTdJBwIfBEZGxGuSBtJkp/XZmiJiEU32C19rDPcgrJEGAUsj4jWAiFiaPnyQtK+kOyXNkHRLOqMDSTtLujmV3y1pt1Q+SdKlkn4vaU66G9caJJ2Seit/kPSjVLZjuln8Q+l5WMF60ySNStMDJc1L06dK+rmkX0iaK+mTks6T9ICkeyQNyK1/kaT7JP1R0rtT+Z6p7MG0/xFdxH2xpJkpvo70OszMLR8haUYXcV8i6S5JsyTtJ+l6SU9K+lqu3nmSHkmPc1PZRZLOytX5qqTPSBou6ZFU1kfSv0q6P8V/RldvtrWBiPDDj4Y8gC2AB8kua/xfwHtTeT/g90BHmj8e+EGangqMSNN/C9yepicB15J9ydkDmF2wvz2BJ8iumgkwID3/Ahibpj8G/DxNfxX4bJqeBoxK0wOBeWn6VGA2sCXZtXOWAWemZZcA5+bWvzhNHwXclqa/DZyUpjcGNiuIO3J1vgL8Z5q+A9gnTX8DOKdg3WnARWn608AissS8CbAQ2BbYF3gY2Dy9J48C70yPO3PbegwYBgwHHkll44AvpelNgOnATlX/bflRzsOHmKxhIuIVSfsC7wbeB1wtaTzZh8zfALdKAugDLJa0Bdl1aq5N5ZB9KHX6eUSsAh6TtH3BLg8luxnL0rT/F1L5gcCH0/SPgG+uZVPuiOzuX8slLSNLOJB96O6Vq3d9ep5B9iEL8N/A+ZKGANdHxJMF218FXJ2mf5zbzmXAaZLOI0ui+3cR35RcPI9GxGIASXOAocDBwA0R8Woqvx54d0RcKmk7SW8lS34vRsTTkobntv1+YK9cj21rYATZ7TKtzThBWENFxJtk33KnSXoYGEv2AfpoRByYrytpK+CliNini829lq9esFykSyn3FFZB2UpWH4LdtJv9rsrNr+J//091lr/ZWR4RV0q6l+zyCrdI+nhE3F5nfD8DLgBuB2ZExPNd1M/HUxtrX4pfq07XkY037AD8tGC5yHout/QQs7UBj0FYw0h6e80x932A+WSHgTrSIDaS+knaM7ILmc2VNCaVS9Lea7HLqcBxyq65T+f4ANnhrBPS9EnAbwvWnUd2KAZ6cYBW2Y3q50TEpWTf9PcqqLZRbp8f6YwvIv5CdtXPCcAP1yOMu4BjJL1F2f2QPwTcnZb9lOy1OZYsWdS6BfhHSf1Se3ZN27A25B6ENdIWwLcl9Sf7hj4bGBcRr6dDFpdK2prs7/JbZMfGTwImSPoS2VjFT4E/1LOziHhU0teBOyW9CTxANobwKeAHkj4HLAFOK1j934BrJJ1M9o29txwPfFTSG8AzwL8U1HkV2DMNQi9L63T6CdnhsXW+HENEzJQ0CbgvFV0WEQ+kZY9K2hL4U+ehqRqXkR0um6nsuN8S4Jh1jcWamy+1YdZClP1OY+uI+HLVsVj7cw/CrEVIuoHsJjOHVh2LbRjcgzAzs0IepDYzs0JOEGZmVsgJwszMCjlBmJlZIScIMzMr5ARhZmaF/gfp0ZuVysChHgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sum_seen_movies = star_wars[star_wars.columns[3:9]].sum()\n", "sum_seen_movies.plot.bar()\n", "plt.title('Total of people that have seen each movie')\n", "plt.ylabel('Count of people')\n", "plt.xlabel('Seen columns by movie')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The movies most seen are the the fifth and the sixth, which also were the ones best ranked.\n", "\n", "### Exploring the data by gender segment\n", "Let's examine how certain segments of the survey population responded. There are several columns that segment our data into two groups. Here are a few examples:\n", "\n", "- Do you consider yourself to be a fan of the Star Wars film franchise? - True or False\n", "- Do you consider yourself to be a fan of the Star Trek franchise? - Yes or No\n", "- Gender - Male or Female\n", "\n", "We can split a dataframe into two groups based on a binary column by creating two subsets of that column. We will split on the Gender column to compute the most viewed movie, the highest-ranked movie for each gender group." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Text(0.5,1,'Total of females that have seen each movie')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "males = star_wars[star_wars[\"Gender\"] == \"Male\"]\n", "females = star_wars[star_wars[\"Gender\"] == \"Female\"]\n", "\n", "fig = plt.figure(figsize=(15, 3))\n", "ax = fig.add_subplot(1,2,1)\n", "males[males.columns[3:9]].sum().plot.bar(ax=ax)\n", "ax.set_title('Total of males that have seen each movie')\n", "ax = fig.add_subplot(1,2,2)\n", "females[females.columns[3:9]].sum().plot.bar(ax=ax)\n", "ax.set_title('Total of females that have seen each movie')\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5,1,'Movies ranked by females')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = plt.figure(figsize=(15, 3))\n", "ax = fig.add_subplot(1,2,1)\n", "males[males.columns[9:15]].mean().plot.bar(ax=ax)\n", "ax.set_title('Movies ranked by males')\n", "ax = fig.add_subplot(1,2,2)\n", "females[females.columns[9:15]].mean().plot.bar(ax=ax)\n", "ax.set_title('Movies ranked by females')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Males have seen all episodes more than females, where the difference is bigger in the episodes 1 to 3. However, they rated these episodes with worse rankings than females. The old movies are the most watched and best ranked for both gender.\n", "\n", "## Conclusion\n", "\n", "In this project, we explored the data from an exists survey from Star Wars fans using the online tool SurveyMonkey. We can reach that the episode 5 “The Empire Strikes Back” is the most seen and best ranked episode by most of the respondents. In general, the earlier movies seem to be more popular. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 1 }