{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Brexit - the data analysis\n", "\n", "We start, as usual, by importing all the libraries we need." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "# Fancy plots\n", "plt.style.use('fivethirtyeight')\n", "\n", "# Data frame library\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## All about the Brexiteers\n", "\n", "Every year, the [Hansard\n", "Society](https://www.hansardsociety.org.uk/research/audit-of-political-engagement)\n", "sponsors a survey on political engagement in the UK.\n", "\n", "They put topical questions in each survey. For the 2016 / 7 survey, they asked\n", "about how people voted in the Brexit referendum.\n", "\n", "Luckily, they make the data freely available online for us to analyze.\n", "\n", "You can get the data for yourself from the UK Data Service:\n", "[https://discover.ukdataservice.ac.uk/catalogue/?sn=8183](https://discover.ukdataservice.ac.uk/catalogue/?sn=8183).\n", "There are data files in various formats, including:\n", "\n", "* SPSS format (for the SPSS statistical package);\n", "* Stata format (for the Stata statistical package);\n", "* tab-delimited (a general data format, that can be used with Pandas, Excel,\n", " and other packages).\n", "\n", "The data is in a standard form, with one row per respondent, and one column\n", "per question.\n", "\n", "To save you a tiny bit of work, I have made an unchanged copy of the\n", "tab-delimited version of the data file for you to download directly. I have\n", "also made a copy of the document describing the questions they ask and the way\n", "that they have recorded the answers in the data file. This is often called the\n", "“data dictionary”. It was originally in Rich Text Format, but I have converted\n", "to PDF for convenience. It is otherwise identical to the file you will find at\n", "the UK Data Service.\n", "\n", "You can download these copies from the following links:\n", "\n", "* [tab-delimited data file]({{ site.baseurl }}/data/audit_of_political_engagement_14_2017.tab);\n", "* [data dictionary PDF file]({{ site.baseurl }}/data/audit_of_political_engagement_14_2017_ukda_data_dictionary.pdf).\n", "\n", "If you are running this notebook on your laptop, download the tab-delimited\n", "data file to the same directory as the notebook.\n", "\n", "In the moment, we are going to try and analyze these data. We will focus on\n", "two questions labeled `cut15` and `numage`. `cut15` is the question\n", "about Brexit. The data dictionary has the *variable label* “CUT15 - How did you\n", "vote on the question ‘Should the United Kingdom remain a member of the European\n", "Union or leave the European Union’?”. The recorded values run from 1 through\n", "6 and have the following labels:\n", "\n", "```\n", "Value label information for cut15\n", "Value = 1.0 Label = Remain a member of the European Union\n", "Value = 2.0 Label = Leave the European Union\n", "Value = 3.0 Label = Did not vote\n", "Value = 4.0 Label = Too young\n", "Value = 5.0 Label = Can't remember\n", "Value = 6.0 Label = Refused\n", "```\n", "\n", "We also want the variable `numage`; this is the age of the respondent in years." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data file that you just downloaded should be called\n", "`audit_of_political_engagement_14_2017.tab`. The cell below loads the data\n", "file into memory with Pandas:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Load the data frame, and put it in the variable \"audit_data\"\n", "audit_data = pd.read_table('audit_of_political_engagement_14_2017.tab')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you know, we now have a *data frame*:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(audit_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data frame has one row per person surveyed, and one column for each\n", "question in the survey. The columns have kind-of helpful names that you can\n", "read about in the data dictionary:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cu041cu042cu043cu044cu045cu046cu047cu048cu049cu0410...inttencx_971_980serialweekwtsnumageweight0sgrade_grpage_grpregion2
00000110000...-13.4165913996483.41659373.41659143
10000000001...-12.6819817336482.68198552.68198263
20000100000...-10.7937917366480.79379710.79379274
30000101000...-11.4058017376481.40580371.40580144
40001101000...-10.8947517386480.89475420.89475244
51100000000...-13.2253518016483.2253503.22535113
60000100000...-11.5292218026481.52922691.52922173
71000100000...-12.8965518036482.89655202.89655223
80000100000...-14.6639318046484.66393384.66393143
91001101000...-11.4373218066481.43732601.43732463
100001000000...-11.8110918076481.8110901.81109313
110000000000...-14.1863218086484.18632324.18632133
120000000000...-11.8138218096481.81382791.81382283
131000010000...-11.7348818116481.7348801.73488414
141000111000...-11.2263118126481.22631581.22631164
150001010000...-11.7302418136481.73024461.73024154
160000010000...-11.8313018146481.83130511.83130154
171000000000...-10.5017218156480.50172740.50172374
180000010000...-13.3390018166483.33900573.33900163
190000000000...-11.8909418176481.89094611.89094363
200000000000...-13.8023918186483.80239473.80239253
210000010000...-11.5558918196481.55589561.55589463
220000000000...-12.7349418206482.73494872.73494283
230000000000...-13.1855218216483.18552763.18552183
240000100000...-14.3835418226484.38354204.38354123
250000000000...-10.6349518236480.63495350.63495344
261000110000...-10.4839018246480.48390280.48390231
270000000000...-10.8514918256480.85149380.85149241
280000000000...-11.1218318286481.12183441.12183343
290001000000...-12.3871118296482.38711382.38711243
..................................................................
17410000000000...90.4592532626490.45925350.45925344
17420000000000...90.9748132846490.97481390.97481346
17430001100000...90.9750732916490.97507390.97507145
17440000000000...91.8163732956491.81637441.81637245
17450001010000...90.1378633296490.13786180.13786226
17460000000000...90.3409233446490.34092400.34092145
17470000000000...91.0929633646491.09296601.09296364
17480000000000...41.1637133676491.16371361.16371244
17490000010000...90.9881133686490.98811720.98811474
17500000000000...90.3653533706490.36535700.36535476
17510000000000...90.7616733726490.76167310.76167136
17520000000000...40.2472933776490.24729200.24729426
17530000000000...60.1724833786490.17248670.17248476
17540000000000...50.2285433806490.22854540.22854456
17550000000000...90.3988833826490.39888180.39888426
17560000000000...90.1763133846490.17631180.17631226
17570000000000...90.3281433866490.32814240.32814426
17580000000000...90.2158833886490.21588200.21588226
17591000000000...90.6161633896490.61616360.61616346
17600000000000...90.2125033906490.21250420.21250446
17610000100000...90.3277633926490.32776370.32776146
17620000000000...70.2740633946490.27406190.27406426
17630000000000...90.6312033996490.63120360.63120146
17640011010000...90.6579234076490.65792670.65792474
17650000000000...90.5441534226490.54415400.54415244
17660000000000...90.4433934236490.44339390.44339144
17670000000000...90.4408634256490.44086200.44086324
17680000000000...90.3259034266490.32590310.32590334
17690000000000...90.6697034276490.66970470.66970454
17700000000000...90.3947834346490.39478250.39478434
\n", "

1771 rows × 370 columns

\n", "
" ], "text/plain": [ " cu041 cu042 cu043 cu044 cu045 cu046 cu047 cu048 cu049 cu0410 \\\n", "0 0 0 0 0 1 1 0 0 0 0 \n", "1 0 0 0 0 0 0 0 0 0 1 \n", "2 0 0 0 0 1 0 0 0 0 0 \n", "3 0 0 0 0 1 0 1 0 0 0 \n", "4 0 0 0 1 1 0 1 0 0 0 \n", "5 1 1 0 0 0 0 0 0 0 0 \n", "6 0 0 0 0 1 0 0 0 0 0 \n", "7 1 0 0 0 1 0 0 0 0 0 \n", "8 0 0 0 0 1 0 0 0 0 0 \n", "9 1 0 0 1 1 0 1 0 0 0 \n", "10 0 0 0 1 0 0 0 0 0 0 \n", "11 0 0 0 0 0 0 0 0 0 0 \n", "12 0 0 0 0 0 0 0 0 0 0 \n", "13 1 0 0 0 0 1 0 0 0 0 \n", "14 1 0 0 0 1 1 1 0 0 0 \n", "15 0 0 0 1 0 1 0 0 0 0 \n", "16 0 0 0 0 0 1 0 0 0 0 \n", "17 1 0 0 0 0 0 0 0 0 0 \n", "18 0 0 0 0 0 1 0 0 0 0 \n", "19 0 0 0 0 0 0 0 0 0 0 \n", "20 0 0 0 0 0 0 0 0 0 0 \n", "21 0 0 0 0 0 1 0 0 0 0 \n", "22 0 0 0 0 0 0 0 0 0 0 \n", "23 0 0 0 0 0 0 0 0 0 0 \n", "24 0 0 0 0 1 0 0 0 0 0 \n", "25 0 0 0 0 0 0 0 0 0 0 \n", "26 1 0 0 0 1 1 0 0 0 0 \n", "27 0 0 0 0 0 0 0 0 0 0 \n", "28 0 0 0 0 0 0 0 0 0 0 \n", "29 0 0 0 1 0 0 0 0 0 0 \n", "... ... ... ... ... ... ... ... ... ... ... \n", "1741 0 0 0 0 0 0 0 0 0 0 \n", "1742 0 0 0 0 0 0 0 0 0 0 \n", "1743 0 0 0 1 1 0 0 0 0 0 \n", "1744 0 0 0 0 0 0 0 0 0 0 \n", "1745 0 0 0 1 0 1 0 0 0 0 \n", "1746 0 0 0 0 0 0 0 0 0 0 \n", "1747 0 0 0 0 0 0 0 0 0 0 \n", "1748 0 0 0 0 0 0 0 0 0 0 \n", "1749 0 0 0 0 0 1 0 0 0 0 \n", "1750 0 0 0 0 0 0 0 0 0 0 \n", "1751 0 0 0 0 0 0 0 0 0 0 \n", "1752 0 0 0 0 0 0 0 0 0 0 \n", "1753 0 0 0 0 0 0 0 0 0 0 \n", "1754 0 0 0 0 0 0 0 0 0 0 \n", "1755 0 0 0 0 0 0 0 0 0 0 \n", "1756 0 0 0 0 0 0 0 0 0 0 \n", "1757 0 0 0 0 0 0 0 0 0 0 \n", "1758 0 0 0 0 0 0 0 0 0 0 \n", "1759 1 0 0 0 0 0 0 0 0 0 \n", "1760 0 0 0 0 0 0 0 0 0 0 \n", "1761 0 0 0 0 1 0 0 0 0 0 \n", "1762 0 0 0 0 0 0 0 0 0 0 \n", "1763 0 0 0 0 0 0 0 0 0 0 \n", "1764 0 0 1 1 0 1 0 0 0 0 \n", "1765 0 0 0 0 0 0 0 0 0 0 \n", "1766 0 0 0 0 0 0 0 0 0 0 \n", "1767 0 0 0 0 0 0 0 0 0 0 \n", "1768 0 0 0 0 0 0 0 0 0 0 \n", "1769 0 0 0 0 0 0 0 0 0 0 \n", "1770 0 0 0 0 0 0 0 0 0 0 \n", "\n", " ... intten cx_971_980 serial week wts numage weight0 \\\n", "0 ... -1 3.41659 1399 648 3.41659 37 3.41659 \n", "1 ... -1 2.68198 1733 648 2.68198 55 2.68198 \n", "2 ... -1 0.79379 1736 648 0.79379 71 0.79379 \n", "3 ... -1 1.40580 1737 648 1.40580 37 1.40580 \n", "4 ... -1 0.89475 1738 648 0.89475 42 0.89475 \n", "5 ... -1 3.22535 1801 648 3.22535 0 3.22535 \n", "6 ... -1 1.52922 1802 648 1.52922 69 1.52922 \n", "7 ... -1 2.89655 1803 648 2.89655 20 2.89655 \n", "8 ... -1 4.66393 1804 648 4.66393 38 4.66393 \n", "9 ... -1 1.43732 1806 648 1.43732 60 1.43732 \n", "10 ... -1 1.81109 1807 648 1.81109 0 1.81109 \n", "11 ... -1 4.18632 1808 648 4.18632 32 4.18632 \n", "12 ... -1 1.81382 1809 648 1.81382 79 1.81382 \n", "13 ... -1 1.73488 1811 648 1.73488 0 1.73488 \n", "14 ... -1 1.22631 1812 648 1.22631 58 1.22631 \n", "15 ... -1 1.73024 1813 648 1.73024 46 1.73024 \n", "16 ... -1 1.83130 1814 648 1.83130 51 1.83130 \n", "17 ... -1 0.50172 1815 648 0.50172 74 0.50172 \n", "18 ... -1 3.33900 1816 648 3.33900 57 3.33900 \n", "19 ... -1 1.89094 1817 648 1.89094 61 1.89094 \n", "20 ... -1 3.80239 1818 648 3.80239 47 3.80239 \n", "21 ... -1 1.55589 1819 648 1.55589 56 1.55589 \n", "22 ... -1 2.73494 1820 648 2.73494 87 2.73494 \n", "23 ... -1 3.18552 1821 648 3.18552 76 3.18552 \n", "24 ... -1 4.38354 1822 648 4.38354 20 4.38354 \n", "25 ... -1 0.63495 1823 648 0.63495 35 0.63495 \n", "26 ... -1 0.48390 1824 648 0.48390 28 0.48390 \n", "27 ... -1 0.85149 1825 648 0.85149 38 0.85149 \n", "28 ... -1 1.12183 1828 648 1.12183 44 1.12183 \n", "29 ... -1 2.38711 1829 648 2.38711 38 2.38711 \n", "... ... ... ... ... ... ... ... ... \n", "1741 ... 9 0.45925 3262 649 0.45925 35 0.45925 \n", "1742 ... 9 0.97481 3284 649 0.97481 39 0.97481 \n", "1743 ... 9 0.97507 3291 649 0.97507 39 0.97507 \n", "1744 ... 9 1.81637 3295 649 1.81637 44 1.81637 \n", "1745 ... 9 0.13786 3329 649 0.13786 18 0.13786 \n", "1746 ... 9 0.34092 3344 649 0.34092 40 0.34092 \n", "1747 ... 9 1.09296 3364 649 1.09296 60 1.09296 \n", "1748 ... 4 1.16371 3367 649 1.16371 36 1.16371 \n", "1749 ... 9 0.98811 3368 649 0.98811 72 0.98811 \n", "1750 ... 9 0.36535 3370 649 0.36535 70 0.36535 \n", "1751 ... 9 0.76167 3372 649 0.76167 31 0.76167 \n", "1752 ... 4 0.24729 3377 649 0.24729 20 0.24729 \n", "1753 ... 6 0.17248 3378 649 0.17248 67 0.17248 \n", "1754 ... 5 0.22854 3380 649 0.22854 54 0.22854 \n", "1755 ... 9 0.39888 3382 649 0.39888 18 0.39888 \n", "1756 ... 9 0.17631 3384 649 0.17631 18 0.17631 \n", "1757 ... 9 0.32814 3386 649 0.32814 24 0.32814 \n", "1758 ... 9 0.21588 3388 649 0.21588 20 0.21588 \n", "1759 ... 9 0.61616 3389 649 0.61616 36 0.61616 \n", "1760 ... 9 0.21250 3390 649 0.21250 42 0.21250 \n", "1761 ... 9 0.32776 3392 649 0.32776 37 0.32776 \n", "1762 ... 7 0.27406 3394 649 0.27406 19 0.27406 \n", "1763 ... 9 0.63120 3399 649 0.63120 36 0.63120 \n", "1764 ... 9 0.65792 3407 649 0.65792 67 0.65792 \n", "1765 ... 9 0.54415 3422 649 0.54415 40 0.54415 \n", "1766 ... 9 0.44339 3423 649 0.44339 39 0.44339 \n", "1767 ... 9 0.44086 3425 649 0.44086 20 0.44086 \n", "1768 ... 9 0.32590 3426 649 0.32590 31 0.32590 \n", "1769 ... 9 0.66970 3427 649 0.66970 47 0.66970 \n", "1770 ... 9 0.39478 3434 649 0.39478 25 0.39478 \n", "\n", " sgrade_grp age_grp region2 \n", "0 1 4 3 \n", "1 2 6 3 \n", "2 2 7 4 \n", "3 1 4 4 \n", "4 2 4 4 \n", "5 1 1 3 \n", "6 1 7 3 \n", "7 2 2 3 \n", "8 1 4 3 \n", "9 4 6 3 \n", "10 3 1 3 \n", "11 1 3 3 \n", "12 2 8 3 \n", "13 4 1 4 \n", "14 1 6 4 \n", "15 1 5 4 \n", "16 1 5 4 \n", "17 3 7 4 \n", "18 1 6 3 \n", "19 3 6 3 \n", "20 2 5 3 \n", "21 4 6 3 \n", "22 2 8 3 \n", "23 1 8 3 \n", "24 1 2 3 \n", "25 3 4 4 \n", "26 2 3 1 \n", "27 2 4 1 \n", "28 3 4 3 \n", "29 2 4 3 \n", "... ... ... ... \n", "1741 3 4 4 \n", "1742 3 4 6 \n", "1743 1 4 5 \n", "1744 2 4 5 \n", "1745 2 2 6 \n", "1746 1 4 5 \n", "1747 3 6 4 \n", "1748 2 4 4 \n", "1749 4 7 4 \n", "1750 4 7 6 \n", "1751 1 3 6 \n", "1752 4 2 6 \n", "1753 4 7 6 \n", "1754 4 5 6 \n", "1755 4 2 6 \n", "1756 2 2 6 \n", "1757 4 2 6 \n", "1758 2 2 6 \n", "1759 3 4 6 \n", "1760 4 4 6 \n", "1761 1 4 6 \n", "1762 4 2 6 \n", "1763 1 4 6 \n", "1764 4 7 4 \n", "1765 2 4 4 \n", "1766 1 4 4 \n", "1767 3 2 4 \n", "1768 3 3 4 \n", "1769 4 5 4 \n", "1770 4 3 4 \n", "\n", "[1771 rows x 370 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "audit_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data frame has columns for all the questions listed in the data\n", "dictionary:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['cu041', 'cu042', 'cu043', 'cu044', 'cu045', 'cu046', 'cu047', 'cu048',\n", " 'cu049', 'cu0410',\n", " ...\n", " 'intten', 'cx_971_980', 'serial', 'week', 'wts', 'numage', 'weight0',\n", " 'sgrade_grp', 'age_grp', 'region2'],\n", " dtype='object', length=370)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "audit_data.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To reduce clutter, we first make a new data frame that just has the two\n", "questions we are interested in. To do this we first make a list with the names of the columns we want:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "desired_columns = [\"numage\", \"cut15\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we use this list, to make a new data frame, that only has the named columns, like this:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
numagecut15
0371
1551
2712
3371
4421
501
6691
7201
8381
9602
1001
11321
12793
1302
14581
15461
16511
17742
18571
19612
20472
21562
22871
23762
24203
25352
26281
27381
28442
29382
.........
1741351
1742393
1743391
1744441
1745184
1746401
1747603
1748363
1749721
1750701
1751313
1752203
1753671
1754542
1755182
1756184
1757243
1758202
1759363
1760423
1761371
1762195
1763366
1764671
1765402
1766391
1767203
1768312
1769473
1770253
\n", "

1771 rows × 2 columns

\n", "
" ], "text/plain": [ " numage cut15\n", "0 37 1\n", "1 55 1\n", "2 71 2\n", "3 37 1\n", "4 42 1\n", "5 0 1\n", "6 69 1\n", "7 20 1\n", "8 38 1\n", "9 60 2\n", "10 0 1\n", "11 32 1\n", "12 79 3\n", "13 0 2\n", "14 58 1\n", "15 46 1\n", "16 51 1\n", "17 74 2\n", "18 57 1\n", "19 61 2\n", "20 47 2\n", "21 56 2\n", "22 87 1\n", "23 76 2\n", "24 20 3\n", "25 35 2\n", "26 28 1\n", "27 38 1\n", "28 44 2\n", "29 38 2\n", "... ... ...\n", "1741 35 1\n", "1742 39 3\n", "1743 39 1\n", "1744 44 1\n", "1745 18 4\n", "1746 40 1\n", "1747 60 3\n", "1748 36 3\n", "1749 72 1\n", "1750 70 1\n", "1751 31 3\n", "1752 20 3\n", "1753 67 1\n", "1754 54 2\n", "1755 18 2\n", "1756 18 4\n", "1757 24 3\n", "1758 20 2\n", "1759 36 3\n", "1760 42 3\n", "1761 37 1\n", "1762 19 5\n", "1763 36 6\n", "1764 67 1\n", "1765 40 2\n", "1766 39 1\n", "1767 20 3\n", "1768 31 2\n", "1769 47 3\n", "1770 25 3\n", "\n", "[1771 rows x 2 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select the age and Brexit vote questions only\n", "mini_brexit = audit_data[['numage', 'cut15']]\n", "mini_brexit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get started on exploring, we make a new variable `ages` that refers to the\n", "`numage` column in the `mini_brexit` data frame." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Make a new variable \"ages\" that refers to the \"numage\" column in \"mini_brexit\"\n", "ages = mini_brexit[\"numage\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Confirm that `ages` has a value of type `Series`, the Pandas type for a column of a data frame:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(ages)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the numbers of rows, columns in the original data frame:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1771, 370)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "audit_data.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the cell below to confirm that `ages` has the same number of values, as\n", "`audit_data` has rows. To do this, we can use the `len` function, as applied\n", "to the `ages` Series. It returns the number of values." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1771" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(ages)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In fact, `len`, as applied to the *data frame*, returns the number of rows:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1771" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(audit_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start by doing a histogram of the values in `ages` (which are also the values\n", "in the `numage` column of `mini_brexit`). If you can't remember how to do\n", "histograms, have a look at the [introduction to data\n", "frames](../04/data_frame_intro) notebook. Hint: consider using the `hist`\n", "method of the `ages` variable." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Do a histogram of the values in the \"numage\" column.\n", "# Your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will see that a few subjects have an age of 0.\n", "\n", "It looks as if the survey coders are using the value 0 to mean that the person\n", "did not state their age. We will have to clean that up. We do that by\n", "selecting the cases that have ages not equal to 0." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hint: You have seen the operator to say whether two values are equal or no:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 == 2" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 == 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The operator for *not equal* is `!=`, as in:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 != 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Prepare for brain-bending double negative..." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2 != 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To identify the values in `ages` that are *not equal* to 0, use the comparison\n", "I've hinted at above, to make a new variable, `age_not_0`, that has the same\n", "number of values as `ages`, and has `True` at positions where `ages` is *not\n", "equal* to 0, and `False` otherwise. We will refer to these sequences of True\n", "and False values, as *Boolean vectors*.\n", "\n", "Check back to the [introduction to data frames](../04/data_frame_intro)\n", "notebook for a reminder of making and using Boolean vectors to select rows from\n", "data frames." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# Create new variable \"age_not_0\", with True at positions where \"ages\" is not\n", "# equal to 0, and False otherwise.\n", "# age_not_0 = ?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `age_not_0` to select rows in the `mini_brexit` data frame where the value\n", "is `True`, and throw away the rows where the value is `False`. To do this, use\n", "the `loc` function attached to the data frame. It *locates* values:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# Select rows in the data frame where the age is not equal to 0 Make a new data\n", "# frame called \"good_brexit\" that only contains these rows. Your code will start\n", "# good_brexit = mini_brexit.loc?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to ask what proportion of the respondents said that they voted\n", "Remain or Leave.\n", "\n", "First we make a new data frame that contains only the rows for people who said\n", "they voted No in the referendum (remain). Remember, from the data dictionary,\n", "that 1 is the code for a No vote.\n", "\n", "First, make a new variable `votes` that has the values of `cut15` column of the\n", "`good_brexit` data frame." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# Make a new variable \"votes\" that refers to the \"cut15\" column in \"good_brexit\".\n", "# Your code will start with\n", "# votes = ?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now make a new Boolean vector, that has True at the positions where `votes` is\n", "equal to 1, and False otherwise. Call this variable `is_remain`." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "# Make a Boolean vector, called \"is_remain\", that True for Remain row, False\n", "# otherwise.\n", "# is_remain = ?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, use `is_remain` to select the rows in `good_brexit` that correspond to\n", "confessed \"Remain\" voters. Call the new data frame `remainers`:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# Select the rows from \"good_brexit\" that correspond to Remain voters\n", "# remainers = good_brexit?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do a histogram of the values in the `numage` column of `remainers`:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# Show a histogram of the `numage` column from `remainers`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, go through the same steps, to make a new data frame for those who claimed\n", "to vote Yes (leave) (code 2):" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# Make a Boolean vector, called \"is_leave\", that True for Leave row, False\n", "# otherwise.\n", "# is_leave = ?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, use `is_leave` to select the rows in `good_brexit` that correspond to confessed \"Leave\" voters. Call the new data frame `leavers`:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "# Select the rows from \"good_brexit\" that correspond to Leave voters\n", "# leavers = good_brexit?\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do a histogram of the values in the `numage` column of `leavers`:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "# Show a histgram of the `numage` column from `remainers`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Uncomment the lines in the cell below to get the total number of Remain voters:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# n_remain = len(remainers)\n", "# n_remain" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the total number of Leave voters:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# n_leave = len(leavers)\n", "# n_leave" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the total number of voters who confessed to a specific Leave or Remain vote:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "# n_total = n_leave + n_remain\n", "# n_total" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the proportion of Leave voters:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "# leave_proportion = n_leave / n_total\n", "# leave_proportion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you remember, the proportion of Leave voters in the referendum was 51.9%.\n", "`leave_proportion` from the survey seems a way off. Is it too far off?\n", "\n", "You go back to the survey company and tell them that the proportion of Leave voters seems too low.\n", "\n", "They say the following:\n", "\n", "> We took a random sample of the population. You are a data scientist, you\n", "> know well that the proportion from this random sample is very unlikely to be\n", "> exactly the same as the proportion in the whole population. The proportion\n", "> we get is compatible with the variation we expect from taking a random sample.\n", ">\n", "> In other words - the difference in the proportions, between the referendum\n", "> and the survey, is due to sampling error.\n", "\n", "Time for a simulation.\n", "\n", "The null hypothesis offered by the survey company is that the proportion we saw\n", "above is a plausible value if we took a random sample of `n_total` voters.\n", "\n", "We can simulate a new survey, with `n_total` voters, by taking `n_total` random\n", "numbers between 0 and 1. We consider the values less than 0.52 as\n", "corresponding to a Leave vote, and the rest are Remain votes. We then\n", "calculate the proportion of Leave votes (proportion of values where value <\n", "0\\.519 == True).\n", "\n", "We do this 10000 times, to get 10000 simulated surveys. We calculate the\n", "proportions for each simulated survey, and do a histogram of the proportions.\n", "Is `leave_proportion` a plausible value on this histogram?\n", "\n", "See:\n", "\n", "* [3.8 Reply to the Supreme Court](../03/reply_supreme)\n", "* [3.9 Revision - three girls](../03/three_girls)\n", "\n", "to remind yourself about simulations." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "# Your simulation here" ] } ], "metadata": { "jupytext": { "text_representation": { "extension": ".Rmd", "format_name": "rmarkdown", "format_version": "1.0", "jupytext_version": "0.8.5" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 2 }