{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Analyzing Star Wars Survey Data\n", "\n", "## Introduction\n", "\n", "In this project, we'll aim to explore and clean exists survey from Star Wars fans using the online tool SurveyMonkey to answer the following question: *does the rest of America realize that “The Empire Strikes Back” is clearly the best of the bunch?.* The data contains 835 total responses, which can be downloaded from their [GitHub repository](https://github.com/fivethirtyeight/data/tree/master/star-wars-survey).\n", "\n", "The data has several columns, including:\n", "\n", "- RespondentID - An anonymized ID for the respondent (person taking the survey)\n", "- Gender - The respondent's gender\n", "- Age - The respondent's age\n", "- Household Income - The respondent's income\n", "- Education - The respondent's education level\n", "- Location (Census Region) - The respondent's location\n", "- Have you seen any of the 6 films in the Star Wars franchise? - Has a Yes or No response\n", "- Do you consider yourself to be a fan of the Star Wars film franchise? - Has a Yes or No response\n", "\n", "### Summary of results\n", "\n", "After analyzing the data, we reached that the episode 5 “The Empire Strikes Back” is the most seen and best ranked episode by most of the respondents. In general, the earlier movies seem to be more popular. \n", "\n", "### Exploring the data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | RespondentID | \n", "Have you seen any of the 6 films in the Star Wars franchise? | \n", "Do you consider yourself to be a fan of the Star Wars film franchise? | \n", "Which of the following Star Wars films have you seen? Please select all that apply. | \n", "Unnamed: 4 | \n", "Unnamed: 5 | \n", "Unnamed: 6 | \n", "Unnamed: 7 | \n", "Unnamed: 8 | \n", "Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | \n", "... | \n", "Unnamed: 28 | \n", "Which character shot first? | \n", "Are you familiar with the Expanded Universe? | \n", "Do you consider yourself to be a fan of the Expanded Universe?Âæ | \n", "Do you consider yourself to be a fan of the Star Trek franchise? | \n", "Gender | \n", "Age | \n", "Household Income | \n", "Education | \n", "Location (Census Region) | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "NaN | \n", "Response | \n", "Response | \n", "Star Wars: Episode I The Phantom Menace | \n", "Star Wars: Episode II Attack of the Clones | \n", "Star Wars: Episode III Revenge of the Sith | \n", "Star Wars: Episode IV A New Hope | \n", "Star Wars: Episode V The Empire Strikes Back | \n", "Star Wars: Episode VI Return of the Jedi | \n", "Star Wars: Episode I The Phantom Menace | \n", "... | \n", "Yoda | \n", "Response | \n", "Response | \n", "Response | \n", "Response | \n", "Response | \n", "Response | \n", "Response | \n", "Response | \n", "Response | \n", "
1 | \n", "3.292880e+09 | \n", "Yes | \n", "Yes | \n", "Star Wars: Episode I The Phantom Menace | \n", "Star Wars: Episode II Attack of the Clones | \n", "Star Wars: Episode III Revenge of the Sith | \n", "Star Wars: Episode IV A New Hope | \n", "Star Wars: Episode V The Empire Strikes Back | \n", "Star Wars: Episode VI Return of the Jedi | \n", "3 | \n", "... | \n", "Very favorably | \n", "I don't understand this question | \n", "Yes | \n", "No | \n", "No | \n", "Male | \n", "18-29 | \n", "NaN | \n", "High school degree | \n", "South Atlantic | \n", "
2 | \n", "3.292880e+09 | \n", "No | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "Yes | \n", "Male | \n", "18-29 | \n", "$0 - $24,999 | \n", "Bachelor degree | \n", "West South Central | \n", "
3 rows × 38 columns
\n", "\n", " | RespondentID | \n", "Have you seen any of the 6 films in the Star Wars franchise? | \n", "Do you consider yourself to be a fan of the Star Wars film franchise? | \n", "Which of the following Star Wars films have you seen? Please select all that apply. | \n", "Unnamed: 4 | \n", "Unnamed: 5 | \n", "Unnamed: 6 | \n", "Unnamed: 7 | \n", "Unnamed: 8 | \n", "Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | \n", "... | \n", "Unnamed: 28 | \n", "Which character shot first? | \n", "Are you familiar with the Expanded Universe? | \n", "Do you consider yourself to be a fan of the Expanded Universe?Âæ | \n", "Do you consider yourself to be a fan of the Star Trek franchise? | \n", "Gender | \n", "Age | \n", "Household Income | \n", "Education | \n", "Location (Census Region) | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | \n", "3.292880e+09 | \n", "Yes | \n", "Yes | \n", "Star Wars: Episode I The Phantom Menace | \n", "Star Wars: Episode II Attack of the Clones | \n", "Star Wars: Episode III Revenge of the Sith | \n", "Star Wars: Episode IV A New Hope | \n", "Star Wars: Episode V The Empire Strikes Back | \n", "Star Wars: Episode VI Return of the Jedi | \n", "3 | \n", "... | \n", "Very favorably | \n", "I don't understand this question | \n", "Yes | \n", "No | \n", "No | \n", "Male | \n", "18-29 | \n", "NaN | \n", "High school degree | \n", "South Atlantic | \n", "
2 | \n", "3.292880e+09 | \n", "No | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "Yes | \n", "Male | \n", "18-29 | \n", "$0 - $24,999 | \n", "Bachelor degree | \n", "West South Central | \n", "
3 | \n", "3.292765e+09 | \n", "Yes | \n", "No | \n", "Star Wars: Episode I The Phantom Menace | \n", "Star Wars: Episode II Attack of the Clones | \n", "Star Wars: Episode III Revenge of the Sith | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1 | \n", "... | \n", "Unfamiliar (N/A) | \n", "I don't understand this question | \n", "No | \n", "NaN | \n", "No | \n", "Male | \n", "18-29 | \n", "$0 - $24,999 | \n", "High school degree | \n", "West North Central | \n", "
4 | \n", "3.292763e+09 | \n", "Yes | \n", "Yes | \n", "Star Wars: Episode I The Phantom Menace | \n", "Star Wars: Episode II Attack of the Clones | \n", "Star Wars: Episode III Revenge of the Sith | \n", "Star Wars: Episode IV A New Hope | \n", "Star Wars: Episode V The Empire Strikes Back | \n", "Star Wars: Episode VI Return of the Jedi | \n", "5 | \n", "... | \n", "Very favorably | \n", "I don't understand this question | \n", "No | \n", "NaN | \n", "Yes | \n", "Male | \n", "18-29 | \n", "$100,000 - $149,999 | \n", "Some college or Associate degree | \n", "West North Central | \n", "
5 | \n", "3.292731e+09 | \n", "Yes | \n", "Yes | \n", "Star Wars: Episode I The Phantom Menace | \n", "Star Wars: Episode II Attack of the Clones | \n", "Star Wars: Episode III Revenge of the Sith | \n", "Star Wars: Episode IV A New Hope | \n", "Star Wars: Episode V The Empire Strikes Back | \n", "Star Wars: Episode VI Return of the Jedi | \n", "5 | \n", "... | \n", "Somewhat favorably | \n", "Greedo | \n", "Yes | \n", "No | \n", "No | \n", "Male | \n", "18-29 | \n", "$100,000 - $149,999 | \n", "Some college or Associate degree | \n", "West North Central | \n", "
5 rows × 38 columns
\n", "\n", " | RespondentID | \n", "Have you seen any of the 6 films in the Star Wars franchise? | \n", "Do you consider yourself to be a fan of the Star Wars film franchise? | \n", "seen_1 | \n", "seen_2 | \n", "seen_3 | \n", "seen_4 | \n", "seen_5 | \n", "seen_6 | \n", "Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. | \n", "... | \n", "Unnamed: 28 | \n", "Which character shot first? | \n", "Are you familiar with the Expanded Universe? | \n", "Do you consider yourself to be a fan of the Expanded Universe?Âæ | \n", "Do you consider yourself to be a fan of the Star Trek franchise? | \n", "Gender | \n", "Age | \n", "Household Income | \n", "Education | \n", "Location (Census Region) | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | \n", "3.292880e+09 | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "True | \n", "3 | \n", "... | \n", "Very favorably | \n", "I don't understand this question | \n", "Yes | \n", "No | \n", "No | \n", "Male | \n", "18-29 | \n", "NaN | \n", "High school degree | \n", "South Atlantic | \n", "
2 | \n", "3.292880e+09 | \n", "False | \n", "NaN | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "False | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "Yes | \n", "Male | \n", "18-29 | \n", "$0 - $24,999 | \n", "Bachelor degree | \n", "West South Central | \n", "
3 | \n", "3.292765e+09 | \n", "True | \n", "False | \n", "True | \n", "True | \n", "True | \n", "False | \n", "False | \n", "False | \n", "1 | \n", "... | \n", "Unfamiliar (N/A) | \n", "I don't understand this question | \n", "No | \n", "NaN | \n", "No | \n", "Male | \n", "18-29 | \n", "$0 - $24,999 | \n", "High school degree | \n", "West North Central | \n", "
3 rows × 38 columns
\n", "