{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Project 1: Standardized Testing, Statistical Summaries and Inference\n", "\n", "### Marco Tavora" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview \n", "\n", "Suppose that the College Board, a nonprofit organization responsible for administering the SAT (Scholastic Aptitude Test), seeks to increase the rate of high-school graduates who participate in its exams. This project's aim is to make recommendations about which measures the College Board might take in order to achieve that." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem Statement\n", "\n", "The problem we need to solve is to how to make actionable suggestions to the College Board to help them increase the participation rates in their exams. For that we need to perform an exploratory data analysis (EDA) to find appropriate metrics that can be adjusted by the College Board accordingly. \n", "\n", "Performing the EDA we must among other things:\n", "\n", "- Find relevants patterns in the data \n", "- Search for possible relations between subsets of the data (for example, are scores and participation rates correlated? If yes how?)\n", "- Test hypotheses about the data using statistical inference method\n", "- Identify possible biases in the data and, if possible, suggest corrections\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Brief introduction to the data\n", "\n", "The data is based on the SAT and the ACT (which stands for American College Testing and it is administered by another institution, namely, the ACT. Inc) exams from around the United States in 2017.\n", "\n", "The data contains:\n", "\n", "- Average SAT and ACT scores by state (scores for each section of each exam);\n", "\n", "- Participation rates for both exams by state.\n", "\n", "Both SAT and ACT are standardized tests for college admissions and are similar in content type but have some differences in structure. A few relevant differences are:\n", "\n", "- The ACT has a Science Test and the SAT does not;\n", "\n", "- There is a SAT Math Section for which the student is not allowed to use a calculator; \n", "\n", "- The SAT's College Board joins Reading and Writing into one score, the \"Evidence-Based Reading and Writing\" whereas in the ACT the tests are separated. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## EDA Steps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 0: Importing basic modules" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first need to import Python libraries including:\n", "- `Pandas`, for data manipulation and analysis \n", "- `SciPy` which is a Python-based ecosystem of software for mathematics, science, and engineering. \n", "- `NumPy` which is a library consisting of multidimensional array objects and a collection of routines for processing of array. \n", "- `Statsmodels` which is a Python package that allows users to explore data, estimate statistical models, and perform statistical tests and complements `SciPy`;\n", "- `Matplotlib` is a plotting library for the Python and NumPy;\n", "- `Seaborn` is complimentary to Matplotlib and it specifically targets statistical data visualization\n", "- `Pylab` is embedded inside Matplotlib and provides a Matlab-like experience for the user. It imports portions of Matplotlib and NumPy.\n", "\n", "OBS: This information was taken directly from the documentation." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/marcotavora/Applications/anaconda3/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.\n", " from pandas.core import datetools\n" ] } ], "source": [ "import scipy\n", "import pandas as pd\n", "import scipy.stats as stats\n", "import numpy as np\n", "import csv\n", "import statsmodels.api as sm\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "sns.set_style(\"darkgrid\")\n", "import pytest\n", "import pylab as p\n", "%matplotlib inline\n", "import pandas as pd\n", "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\" # so we can see the value of multiple statements at once.\n", "pd.set_option('display.max_columns', None)\n", "pd.set_option('display.max_rows', None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Load the data and perform basic operations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1 & 2. Load the data in, using pandas, and print the first ten rows of each DataFrame.\n", "\n", "We can refer to the Pandas module using the \"dot notation\" to call its *methods*. To read our data (which is in the form of `csv` files), into a so-called DataFrame structure, we use the method `read_csv()` and pass in each file name as a string:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0StateParticipationEvidence-Based Reading and WritingMathTotal
00Alabama5%5935721165
11Alaska38%5475331080
22Arizona30%5635531116
33Arkansas3%6145941208
44California53%5315241055
55Colorado11%6065951201
66Connecticut100%5305121041
77Delaware100%503492996
88District of Columbia100%482468950
99Florida83%5204971017
\n", "
" ], "text/plain": [ " Unnamed: 0 State Participation \\\n", "0 0 Alabama 5% \n", "1 1 Alaska 38% \n", "2 2 Arizona 30% \n", "3 3 Arkansas 3% \n", "4 4 California 53% \n", "5 5 Colorado 11% \n", "6 6 Connecticut 100% \n", "7 7 Delaware 100% \n", "8 8 District of Columbia 100% \n", "9 9 Florida 83% \n", "\n", " Evidence-Based Reading and Writing Math Total \n", "0 593 572 1165 \n", "1 547 533 1080 \n", "2 563 553 1116 \n", "3 614 594 1208 \n", "4 531 524 1055 \n", "5 606 595 1201 \n", "6 530 512 1041 \n", "7 503 492 996 \n", "8 482 468 950 \n", "9 520 497 1017 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sat = pd.read_csv('sat.csv')\n", "sat.head(10)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0StateParticipationEnglishMathReadingScienceComposite
00National60%20.320.721.421.021.0
11Alabama100%18.918.419.719.419.2
22Alaska65%18.719.820.419.919.8
33Arizona62%18.619.820.119.819.7
44Arkansas100%18.919.019.719.519.4
55California31%22.522.723.122.222.8
66Colorado100%20.120.321.220.920.8
77Connecticut31%25.524.625.624.625.2
88Delaware18%24.123.424.823.624.1
99District of Columbia32%24.423.524.923.524.2
\n", "
" ], "text/plain": [ " Unnamed: 0 State Participation English Math Reading \\\n", "0 0 National 60% 20.3 20.7 21.4 \n", "1 1 Alabama 100% 18.9 18.4 19.7 \n", "2 2 Alaska 65% 18.7 19.8 20.4 \n", "3 3 Arizona 62% 18.6 19.8 20.1 \n", "4 4 Arkansas 100% 18.9 19.0 19.7 \n", "5 5 California 31% 22.5 22.7 23.1 \n", "6 6 Colorado 100% 20.1 20.3 21.2 \n", "7 7 Connecticut 31% 25.5 24.6 25.6 \n", "8 8 Delaware 18% 24.1 23.4 24.8 \n", "9 9 District of Columbia 32% 24.4 23.5 24.9 \n", "\n", " Science Composite \n", "0 21.0 21.0 \n", "1 19.4 19.2 \n", "2 19.9 19.8 \n", "3 19.8 19.7 \n", "4 19.5 19.4 \n", "5 22.2 22.8 \n", "6 20.9 20.8 \n", "7 24.6 25.2 \n", "8 23.6 24.1 \n", "9 23.5 24.2 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "act = pd.read_csv('act.csv')\n", "act.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the first columns of both tables seem to be identical to the DataFrame indexes. We can quickly confirm that using an `assert` statement. When an `assert` statement is encountered, Python evaluates it and if the expression is false raises an `AssertionError` exception (from www.tutorialspoint.com)." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "assert (sat.index.tolist() == sat['Unnamed: 0'].tolist())\n", "assert (act.index.tolist() == act['Unnamed: 0'].tolist())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the the box above we are allowed to drop the first columns of both DataFrames.\n", "\n", "Next, I renamed the column \"Evidence-Based Reading and Writing\" as \"EBRW\" to be able to use methods on it via the dot notation. \n", "\n", "Dropping also the last column of the SAT table (which is just the sum of the two previous ones) and the last row and renaming the others we obtain the following DataFrame for the SAT scores:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateParticipationEBRWMath
0Alabama5%593572
1Alaska38%547533
2Arizona30%563553
3Arkansas3%614594
4California53%531524
\n", "
" ], "text/plain": [ " State Participation EBRW Math\n", "0 Alabama 5% 593 572\n", "1 Alaska 38% 547 533\n", "2 Arizona 30% 563 553\n", "3 Arkansas 3% 614 594\n", "4 California 53% 531 524" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cols_to_keep = ['State', 'Participation', 'Evidence-Based Reading and Writing', 'Math']\n", "sat = sat[cols_to_keep]\n", "sat.columns = ['State', 'Participation', 'EBRW', 'Math']\n", "sat.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with the SAT DataFrame, we drop last column of the ACT DataFrame (the Composite score is, according to the ACT website, just the average of the four other test scores, rounded to the nearest whole number).\n", "\n", "Furthermore, I used the `.iloc( )` method to exclude the first row of the ACT frame since it is just a summary row." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateParticipationEBRWMath
0Alabama5%593572
1Alaska38%547533
2Arizona30%563553
3Arkansas3%614594
4California53%531524
\n", "
" ], "text/plain": [ " State Participation EBRW Math\n", "0 Alabama 5% 593 572\n", "1 Alaska 38% 547 533\n", "2 Arizona 30% 563 553\n", "3 Arkansas 3% 614 594\n", "4 California 53% 531 524" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateParticipationEnglishMathReadingScience
1Alabama100%18.918.419.719.4
2Alaska65%18.719.820.419.9
3Arizona62%18.619.820.119.8
4Arkansas100%18.919.019.719.5
5California31%22.522.723.122.2
\n", "
" ], "text/plain": [ " State Participation English Math Reading Science\n", "1 Alabama 100% 18.9 18.4 19.7 19.4\n", "2 Alaska 65% 18.7 19.8 20.4 19.9\n", "3 Arizona 62% 18.6 19.8 20.1 19.8\n", "4 Arkansas 100% 18.9 19.0 19.7 19.5\n", "5 California 31% 22.5 22.7 23.1 22.2" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cols_to_keep = ['State', 'Participation', 'English', 'Math', 'Reading', 'Science']\n", "act = act.iloc[1:][cols_to_keep]\n", "sat.head()\n", "act.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3. Describe in words what each variable (column) is.\n", "\n", "\n", "**SAT**\n", " \n", "The table displays three different averages for each state: \n", " - The first column is the state \n", " - The second column is the average participation of students in each state\n", " - The third and fourth columns are the average scores in the Math and Evidence-Based Reading and Writing tests (the name EBRW is explained above).\n", "\n", "**ACT**\n", " \n", "The table displays the following averages for each State: \n", " - The first column is the state \n", " - The second column is the average participation of students in that state\n", " - The third, fourth, fifth and sixth columns are the scores in the English, Math, Reading and Science tests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 4. Does the data look complete? Are there any obvious issues with the observations?\n", "\n", "We can look for problems with the data for example:\n", "\n", "- Using `info()` \n", "- Using `describe()`\n", "- Looking at the last rows and or last columns which frequently contain aggregate values\n", "- Looking for null values\n", "- Outliers\n", "\n", "The third item was taken care of. There were no null values but there are outliers as we shall see when we perform the plotting." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "State False\n", "Participation False\n", "EBRW False\n", "Math False\n", "dtype: bool\n", "State False\n", "Participation False\n", "English False\n", "Math False\n", "Reading False\n", "Science False\n", "dtype: bool\n" ] } ], "source": [ "df_sat = sat.copy() # making copies to keep original ones intact\n", "df_act = act.copy() # making copies to keep original ones intact\n", "print (df_sat.isnull().any())\n", "print (df_act.isnull().any())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 5. Print the types of each column." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "State object\n", "Participation object\n", "EBRW int64\n", "Math int64\n", "dtype: object\n", "State object\n", "Participation object\n", "English float64\n", "Math float64\n", "Reading float64\n", "Science float64\n", "dtype: object\n" ] } ], "source": [ "print (sat.dtypes)\n", "print (act.dtypes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 6. Do any types need to be reassigned? If so, go ahead and do it.\n", "\n", "\n", "I will convert the columns 'Participation' into `floats` using a function to extract the $\\%$ but keeping the scale between 0 and 100. The `.replace( )` method takes the argument `regex` =`True` because `type('%')` = `str`. The function was based on http://pythonjourney.com/python-pandas-dataframe-convert-percent-to-float/" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def perc_into_float(df,col):\n", " return df[col].replace('%','',regex=True).astype('float')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "df_sat['Participation'] = perc_into_float(df_sat,'Participation')\n", "df_act['Participation'] = perc_into_float(df_act,'Participation')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateParticipationEBRWMath
0Alabama5.0593572
1Alaska38.0547533
\n", "
" ], "text/plain": [ " State Participation EBRW Math\n", "0 Alabama 5.0 593 572\n", "1 Alaska 38.0 547 533" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_sat.head(2)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateParticipationEnglishMathReadingScience
1Alabama100.018.918.419.719.4
2Alaska65.018.719.820.419.9
\n", "
" ], "text/plain": [ " State Participation English Math Reading Science\n", "1 Alabama 100.0 18.9 18.4 19.7 19.4\n", "2 Alaska 65.0 18.7 19.8 20.4 19.9" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_act.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Checking types again:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "State object\n", "Participation float64\n", "EBRW int64\n", "Math int64\n", "dtype: object\n", "State object\n", "Participation float64\n", "English float64\n", "Math float64\n", "Reading float64\n", "Science float64\n", "dtype: object\n" ] } ], "source": [ "print (df_sat.dtypes)\n", "print (df_act.dtypes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 7. Create a dictionary for each column mapping the State to its respective value for that column. (For example, you should have three SAT dictionaries.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer version 1**\n", "\n", "Using a list comprehension, the command:\n", "\n", " df.set_index('State').to_dict()\n", " \n", "creates a dictionary with column names as keys. For example:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Participation': {'Alabama': 5.0, 'Alaska': 38.0, 'Arizona': 30.0, 'Arkansas': 3.0, 'California': 53.0, 'Colorado': 11.0, 'Connecticut': 100.0, 'Delaware': 100.0, 'District of Columbia': 100.0, 'Florida': 83.0, 'Georgia': 61.0, 'Hawaii': 55.0, 'Idaho': 93.0, 'Illinois': 9.0, 'Indiana': 63.0, 'Iowa': 2.0, 'Kansas': 4.0, 'Kentucky': 4.0, 'Louisiana': 4.0, 'Maine': 95.0, 'Maryland': 69.0, 'Massachusetts': 76.0, 'Michigan': 100.0, 'Minnesota': 3.0, 'Mississippi': 2.0, 'Missouri': 3.0, 'Montana': 10.0, 'Nebraska': 3.0, 'Nevada': 26.0, 'New Hampshire': 96.0, 'New Jersey': 70.0, 'New Mexico': 11.0, 'New York': 67.0, 'North Carolina': 49.0, 'North Dakota': 2.0, 'Ohio': 12.0, 'Oklahoma': 7.0, 'Oregon': 43.0, 'Pennsylvania': 65.0, 'Rhode Island': 71.0, 'South Carolina': 50.0, 'South Dakota': 3.0, 'Tennessee': 5.0, 'Texas': 62.0, 'Utah': 3.0, 'Vermont': 60.0, 'Virginia': 65.0, 'Washington': 64.0, 'West Virginia': 14.0, 'Wisconsin': 3.0, 'Wyoming': 3.0}, 'EBRW': {'Alabama': 593, 'Alaska': 547, 'Arizona': 563, 'Arkansas': 614, 'California': 531, 'Colorado': 606, 'Connecticut': 530, 'Delaware': 503, 'District of Columbia': 482, 'Florida': 520, 'Georgia': 535, 'Hawaii': 544, 'Idaho': 513, 'Illinois': 559, 'Indiana': 542, 'Iowa': 641, 'Kansas': 632, 'Kentucky': 631, 'Louisiana': 611, 'Maine': 513, 'Maryland': 536, 'Massachusetts': 555, 'Michigan': 509, 'Minnesota': 644, 'Mississippi': 634, 'Missouri': 640, 'Montana': 605, 'Nebraska': 629, 'Nevada': 563, 'New Hampshire': 532, 'New Jersey': 530, 'New Mexico': 577, 'New York': 528, 'North Carolina': 546, 'North Dakota': 635, 'Ohio': 578, 'Oklahoma': 530, 'Oregon': 560, 'Pennsylvania': 540, 'Rhode Island': 539, 'South Carolina': 543, 'South Dakota': 612, 'Tennessee': 623, 'Texas': 513, 'Utah': 624, 'Vermont': 562, 'Virginia': 561, 'Washington': 541, 'West Virginia': 558, 'Wisconsin': 642, 'Wyoming': 626}, 'Math': {'Alabama': 572, 'Alaska': 533, 'Arizona': 553, 'Arkansas': 594, 'California': 524, 'Colorado': 595, 'Connecticut': 512, 'Delaware': 492, 'District of Columbia': 468, 'Florida': 497, 'Georgia': 515, 'Hawaii': 541, 'Idaho': 493, 'Illinois': 556, 'Indiana': 532, 'Iowa': 635, 'Kansas': 628, 'Kentucky': 616, 'Louisiana': 586, 'Maine': 499, 'Maryland': 52, 'Massachusetts': 551, 'Michigan': 495, 'Minnesota': 651, 'Mississippi': 607, 'Missouri': 631, 'Montana': 591, 'Nebraska': 625, 'Nevada': 553, 'New Hampshire': 520, 'New Jersey': 526, 'New Mexico': 561, 'New York': 523, 'North Carolina': 535, 'North Dakota': 621, 'Ohio': 570, 'Oklahoma': 517, 'Oregon': 548, 'Pennsylvania': 531, 'Rhode Island': 524, 'South Carolina': 521, 'South Dakota': 603, 'Tennessee': 604, 'Texas': 507, 'Utah': 614, 'Vermont': 551, 'Virginia': 541, 'Washington': 534, 'West Virginia': 528, 'Wisconsin': 649, 'Wyoming': 604}}\n" ] } ], "source": [ "print (df_sat.set_index('State').to_dict())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We obtain the dictionary values using\n", "\n", " [cols[i]],\n", " \n", "where `cols` are the column names and `cols[i]` are the keys for $i$ within the range of the number of columns. We then return the elements of this list of dictionaries." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def dict_all(df,cols,n):\n", " return [df.set_index('State').to_dict()[cols[i]] for i in range(1,len(cols))][n]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "dsat_part = dict_all(df_sat,df_sat.columns.tolist(),0)\n", "dsat_EBRW = dict_all(df_sat,df_sat.columns.tolist(),1)\n", "dsat_math = dict_all(df_sat,df_sat.columns.tolist(),2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Example:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Alabama': 593, 'Alaska': 547, 'Arizona': 563, 'Arkansas': 614, 'California': 531, 'Colorado': 606, 'Connecticut': 530, 'Delaware': 503, 'District of Columbia': 482, 'Florida': 520, 'Georgia': 535, 'Hawaii': 544, 'Idaho': 513, 'Illinois': 559, 'Indiana': 542, 'Iowa': 641, 'Kansas': 632, 'Kentucky': 631, 'Louisiana': 611, 'Maine': 513, 'Maryland': 536, 'Massachusetts': 555, 'Michigan': 509, 'Minnesota': 644, 'Mississippi': 634, 'Missouri': 640, 'Montana': 605, 'Nebraska': 629, 'Nevada': 563, 'New Hampshire': 532, 'New Jersey': 530, 'New Mexico': 577, 'New York': 528, 'North Carolina': 546, 'North Dakota': 635, 'Ohio': 578, 'Oklahoma': 530, 'Oregon': 560, 'Pennsylvania': 540, 'Rhode Island': 539, 'South Carolina': 543, 'South Dakota': 612, 'Tennessee': 623, 'Texas': 513, 'Utah': 624, 'Vermont': 562, 'Virginia': 561, 'Washington': 541, 'West Virginia': 558, 'Wisconsin': 642, 'Wyoming': 626}\n" ] } ], "source": [ "print( dsat_EBRW)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "dact_part = dict_all(df_sat,df_sat.columns.tolist(),0)\n", "dact_eng = dict_all(df_act,df_act.columns.tolist(),1)\n", "dact_math = dict_all(df_act,df_act.columns.tolist(),2)\n", "dact_read = dict_all(df_act,df_act.columns.tolist(),3)\n", "dact_sci = dict_all(df_act,df_act.columns.tolist(),4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Example:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Alabama': 5.0, 'Alaska': 38.0, 'Arizona': 30.0, 'Arkansas': 3.0, 'California': 53.0, 'Colorado': 11.0, 'Connecticut': 100.0, 'Delaware': 100.0, 'District of Columbia': 100.0, 'Florida': 83.0, 'Georgia': 61.0, 'Hawaii': 55.0, 'Idaho': 93.0, 'Illinois': 9.0, 'Indiana': 63.0, 'Iowa': 2.0, 'Kansas': 4.0, 'Kentucky': 4.0, 'Louisiana': 4.0, 'Maine': 95.0, 'Maryland': 69.0, 'Massachusetts': 76.0, 'Michigan': 100.0, 'Minnesota': 3.0, 'Mississippi': 2.0, 'Missouri': 3.0, 'Montana': 10.0, 'Nebraska': 3.0, 'Nevada': 26.0, 'New Hampshire': 96.0, 'New Jersey': 70.0, 'New Mexico': 11.0, 'New York': 67.0, 'North Carolina': 49.0, 'North Dakota': 2.0, 'Ohio': 12.0, 'Oklahoma': 7.0, 'Oregon': 43.0, 'Pennsylvania': 65.0, 'Rhode Island': 71.0, 'South Carolina': 50.0, 'South Dakota': 3.0, 'Tennessee': 5.0, 'Texas': 62.0, 'Utah': 3.0, 'Vermont': 60.0, 'Virginia': 65.0, 'Washington': 64.0, 'West Virginia': 14.0, 'Wisconsin': 3.0, 'Wyoming': 3.0}\n" ] } ], "source": [ "print( dact_part)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Answer version 2**" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "def func_dict_v2(df,col_name):\n", " return {df['State'][i]:df[col_name][i] for i in range(df.shape[0])}" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "dsat_part_v2 = func_dict_v2(df_sat,'Participation')\n", "dsat_EBRW_v2 = func_dict_v2(df_sat,'EBRW')\n", "dsat_math_v2 = func_dict_v2(df_sat,'Math')" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "assert(dsat_math == dsat_math_v2)\n", "assert(dsat_part == dsat_part_v2)\n", "assert(dsat_math == dsat_math_v2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But **let us use Version 1.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 8. Create one dictionary where each key is the column name, and each value is an iterable (a list or a Pandas Series) of all the values in that column.\n", "\n", "Using a simple dictionary comprehension." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "def dict_col(df):\n", " return {col:df[col].tolist() for col in df.columns}" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'State': ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'], 'Participation': [5.0, 38.0, 30.0, 3.0, 53.0, 11.0, 100.0, 100.0, 100.0, 83.0, 61.0, 55.0, 93.0, 9.0, 63.0, 2.0, 4.0, 4.0, 4.0, 95.0, 69.0, 76.0, 100.0, 3.0, 2.0, 3.0, 10.0, 3.0, 26.0, 96.0, 70.0, 11.0, 67.0, 49.0, 2.0, 12.0, 7.0, 43.0, 65.0, 71.0, 50.0, 3.0, 5.0, 62.0, 3.0, 60.0, 65.0, 64.0, 14.0, 3.0, 3.0], 'EBRW': [593, 547, 563, 614, 531, 606, 530, 503, 482, 520, 535, 544, 513, 559, 542, 641, 632, 631, 611, 513, 536, 555, 509, 644, 634, 640, 605, 629, 563, 532, 530, 577, 528, 546, 635, 578, 530, 560, 540, 539, 543, 612, 623, 513, 624, 562, 561, 541, 558, 642, 626], 'Math': [572, 533, 553, 594, 524, 595, 512, 492, 468, 497, 515, 541, 493, 556, 532, 635, 628, 616, 586, 499, 52, 551, 495, 651, 607, 631, 591, 625, 553, 520, 526, 561, 523, 535, 621, 570, 517, 548, 531, 524, 521, 603, 604, 507, 614, 551, 541, 534, 528, 649, 604]}\n", "\n", "{'State': ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware', 'District of Columbia', 'Florida', 'Georgia', 'Hawaii', 'Idaho', 'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming'], 'Participation': [100.0, 65.0, 62.0, 100.0, 31.0, 100.0, 31.0, 18.0, 32.0, 73.0, 55.0, 90.0, 38.0, 93.0, 35.0, 67.0, 73.0, 100.0, 100.0, 8.0, 28.0, 29.0, 29.0, 100.0, 100.0, 100.0, 100.0, 84.0, 100.0, 18.0, 34.0, 66.0, 31.0, 100.0, 98.0, 75.0, 100.0, 40.0, 23.0, 21.0, 100.0, 80.0, 100.0, 45.0, 100.0, 29.0, 29.0, 29.0, 69.0, 100.0, 100.0], 'English': [18.9, 18.7, 18.6, 18.9, 22.5, 20.1, 25.5, 24.1, 24.4, 19.0, 21.0, 17.8, 21.9, 21.0, 22.0, 21.2, 21.1, 19.6, 19.4, 24.2, 23.3, 25.4, 24.1, 20.4, 18.2, 19.8, 19.0, 20.9, 16.3, 25.4, 23.8, 18.6, 23.8, 17.8, 19.0, 21.2, 18.5, 21.2, 23.4, 24.0, 17.5, 20.7, 19.5, 19.5, 19.5, 23.3, 23.5, 20.9, 20.0, 19.7, 19.4], 'Math': [18.4, 19.8, 19.8, 19.0, 22.7, 20.3, 24.6, 23.4, 23.5, 19.4, 20.9, 19.2, 21.8, 21.2, 22.4, 21.3, 21.3, 19.4, 18.8, 24.0, 23.1, 25.3, 23.7, 21.5, 18.1, 19.9, 20.2, 20.9, 18.0, 25.1, 23.8, 19.4, 24.0, 19.3, 20.4, 21.6, 18.8, 21.5, 23.4, 23.3, 18.6, 21.5, 19.2, 20.7, 19.9, 23.1, 23.3, 21.9, 19.4, 20.4, 19.8], 'Reading': [19.7, 20.4, 20.1, 19.7, 23.1, 21.2, 25.6, 24.8, 24.9, 21.0, 22.0, 19.2, 23.0, 21.6, 23.2, 22.6, 22.3, 20.5, 19.8, 24.8, 24.2, 25.9, 24.5, 21.8, 18.8, 20.8, 21.0, 21.9, 18.1, 26.0, 24.1, 20.4, 24.6, 19.6, 20.5, 22.5, 20.1, 22.4, 24.2, 24.7, 19.1, 22.3, 20.1, 21.1, 20.8, 24.4, 24.6, 22.1, 21.2, 20.6, 20.8], 'Science': [19.4, 19.9, 19.8, 19.5, 22.2, 20.9, 24.6, 23.6, 23.5, 19.4, 21.3, 19.3, 22.1, 21.3, 22.3, 22.1, 21.7, 20.1, 19.6, 23.7, 2.3, 24.7, 23.8, 21.6, 18.8, 20.5, 20.5, 21.5, 18.2, 24.9, 23.2, 20.0, 23.9, 19.3, 20.6, 22.0, 19.6, 21.7, 23.3, 23.4, 18.9, 22.0, 19.9, 20.9, 20.6, 23.2, 23.5, 22.0, 20.5, 20.9, 20.6]}\n" ] } ], "source": [ "print (dict_col(df_sat))\n", "print (\"\")\n", "print (dict_col(df_act))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 9. Merge the dataframes on the state column." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "df_total = pd.merge(df_sat, df_act, on='State')" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateParticipation_xEBRWMath_xParticipation_yEnglishMath_yReadingScience
0Alabama5.0593572100.018.918.419.719.4
1Alaska38.054753365.018.719.820.419.9
2Arizona30.056355362.018.619.820.119.8
3Arkansas3.0614594100.018.919.019.719.5
4California53.053152431.022.522.723.122.2
\n", "
" ], "text/plain": [ " State Participation_x EBRW Math_x Participation_y English \\\n", "0 Alabama 5.0 593 572 100.0 18.9 \n", "1 Alaska 38.0 547 533 65.0 18.7 \n", "2 Arizona 30.0 563 553 62.0 18.6 \n", "3 Arkansas 3.0 614 594 100.0 18.9 \n", "4 California 53.0 531 524 31.0 22.5 \n", "\n", " Math_y Reading Science \n", "0 18.4 19.7 19.4 \n", "1 19.8 20.4 19.9 \n", "2 19.8 20.1 19.8 \n", "3 19.0 19.7 19.5 \n", "4 22.7 23.1 22.2 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_total.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 10. Change the names of the columns so you can distinguish between the SAT columns and the ACT columns." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateParticipation_SAT (%)EBRW_SATMath_SATParticipation_ACT (%)English_ACTMath_ACTReading_ACTScience_ACT
0Alabama5.0593572100.018.918.419.719.4
1Alaska38.054753365.018.719.820.419.9
2Arizona30.056355362.018.619.820.119.8
\n", "
" ], "text/plain": [ " State Participation_SAT (%) EBRW_SAT Math_SAT Participation_ACT (%) \\\n", "0 Alabama 5.0 593 572 100.0 \n", "1 Alaska 38.0 547 533 65.0 \n", "2 Arizona 30.0 563 553 62.0 \n", "\n", " English_ACT Math_ACT Reading_ACT Science_ACT \n", "0 18.9 18.4 19.7 19.4 \n", "1 18.7 19.8 20.4 19.9 \n", "2 18.6 19.8 20.1 19.8 " ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_total = pd.merge(df_sat, df_act, on='State')\n", "df_total.columns = ['State','Participation_SAT (%)','EBRW_SAT','Math_SAT',\\\n", " 'Participation_ACT (%)','English_ACT','Math_ACT','Reading_ACT','Science_ACT']\n", "df_total.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 11. Print the minimum and maximum of each numeric column in the data frame." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_SAT (%)EBRW_SATMath_SATParticipation_ACT (%)English_ACTMath_ACTReading_ACTScience_ACT
min2.0482.052.08.016.318.018.12.3
max100.0644.0651.0100.025.525.326.024.9
\n", "
" ], "text/plain": [ " Participation_SAT (%) EBRW_SAT Math_SAT Participation_ACT (%) \\\n", "min 2.0 482.0 52.0 8.0 \n", "max 100.0 644.0 651.0 100.0 \n", "\n", " English_ACT Math_ACT Reading_ACT Science_ACT \n", "min 16.3 18.0 18.1 2.3 \n", "max 25.5 25.3 26.0 24.9 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_total.describe().loc[['min','max']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 12. Write a function using only list comprehensions, no loops, to compute standard deviation. Using this function, calculate the standard deviation of each numeric column in both data sets. Add these to a list called `sd`.\n", "\n", "$$\\sigma = \\sqrt{\\frac{1}{n}\\sum_{i=1}^n(x_i - \\mu)^2}$$" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "def stdev(X):\n", " n = len(X)\n", " return ((1.0/n)*np.sum([(x-np.mean(X))**2 for x in X]))**(0.5)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[34.929, 45.217, 84.073, 31.824, 2.33, 1.962, 2.047, 3.151]\n" ] } ], "source": [ "cols = df_total.columns[1:].tolist()\n", "sd = [round( stdev([df_total[col].tolist() for col in cols][i] ) ,3) for i in range(0,len(cols))]\n", "print (sd)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that Pandas calculates `std` DataFrame using $n-1$ as denominator instead of $n$:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_SAT (%)EBRW_SATMath_SATParticipation_ACT (%)English_ACTMath_ACTReading_ACTScience_ACT
std35.27663245.66690184.90911932.1408422.3536771.9819892.0672713.182463
\n", "
" ], "text/plain": [ " Participation_SAT (%) EBRW_SAT Math_SAT Participation_ACT (%) \\\n", "std 35.276632 45.666901 84.909119 32.140842 \n", "\n", " English_ACT Math_ACT Reading_ACT Science_ACT \n", "std 2.353677 1.981989 2.067271 3.182463 " ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_total.describe().loc[['std']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Setting the number of ${\\rm{ddof}}=0$ solves this issue and we obtain the same values as the list `sd`." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Participation_SAT (%) 34.929071\n", "EBRW_SAT 45.216970\n", "Math_SAT 84.072555\n", "Participation_ACT (%) 31.824176\n", "English_ACT 2.330488\n", "Math_ACT 1.962462\n", "Reading_ACT 2.046903\n", "Science_ACT 3.151108\n", "dtype: float64" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_total.std(ddof=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Manipulate the dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 13. Turn the list `sd` into a new observation in your dataset.\n", "\n", "I first put `State` as index and then concatenate the new row, renaming it." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_SAT (%)EBRW_SATMath_SATParticipation_ACT (%)English_ACTMath_ACTReading_ACTScience_ACT
State
Alabama5.0593572100.018.918.419.719.4
Alaska38.054753365.018.719.820.419.9
Arizona30.056355362.018.619.820.119.8
Arkansas3.0614594100.018.919.019.719.5
California53.053152431.022.522.723.122.2
\n", "
" ], "text/plain": [ " Participation_SAT (%) EBRW_SAT Math_SAT Participation_ACT (%) \\\n", "State \n", "Alabama 5.0 593 572 100.0 \n", "Alaska 38.0 547 533 65.0 \n", "Arizona 30.0 563 553 62.0 \n", "Arkansas 3.0 614 594 100.0 \n", "California 53.0 531 524 31.0 \n", "\n", " English_ACT Math_ACT Reading_ACT Science_ACT \n", "State \n", "Alabama 18.9 18.4 19.7 19.4 \n", "Alaska 18.7 19.8 20.4 19.9 \n", "Arizona 18.6 19.8 20.1 19.8 \n", "Arkansas 18.9 19.0 19.7 19.5 \n", "California 22.5 22.7 23.1 22.2 " ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_total_new = df_total.copy()\n", "df_total_new = df_total_new.set_index('State')\n", "df_total_new.head()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "df2 = pd.DataFrame([[34.929, 45.217, 84.073, 31.824, 2.33, 1.962, 2.047, 3.151]],columns=df_total_new.columns)\n", "df_total_new = pd.concat([df2,df_total_new])" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_SAT (%)EBRW_SATMath_SATParticipation_ACT (%)English_ACTMath_ACTReading_ACTScience_ACT
sd34.92945.21784.07331.8242.331.9622.0473.151
Alabama5.000593.000572.000100.00018.9018.40019.70019.400
Alaska38.000547.000533.00065.00018.7019.80020.40019.900
Arizona30.000563.000553.00062.00018.6019.80020.10019.800
Arkansas3.000614.000594.000100.00018.9019.00019.70019.500
\n", "
" ], "text/plain": [ " Participation_SAT (%) EBRW_SAT Math_SAT Participation_ACT (%) \\\n", "sd 34.929 45.217 84.073 31.824 \n", "Alabama 5.000 593.000 572.000 100.000 \n", "Alaska 38.000 547.000 533.000 65.000 \n", "Arizona 30.000 563.000 553.000 62.000 \n", "Arkansas 3.000 614.000 594.000 100.000 \n", "\n", " English_ACT Math_ACT Reading_ACT Science_ACT \n", "sd 2.33 1.962 2.047 3.151 \n", "Alabama 18.90 18.400 19.700 19.400 \n", "Alaska 18.70 19.800 20.400 19.900 \n", "Arizona 18.60 19.800 20.100 19.800 \n", "Arkansas 18.90 19.000 19.700 19.500 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_total_new = df_total_new.rename(index={df_total_new.index[0]: 'sd'})\n", "df_total_new.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 14. Sort the dataframe by the values in a numeric column (e.g. observations descending by SAT participation rate)\n", "\n", "I will start from the DataFrame without the `sd` and include it afterwards:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_SAT (%)EBRW_SATMath_SATParticipation_ACT (%)English_ACTMath_ACTReading_ACTScience_ACT
sd34.92945.21784.07331.8242.331.9622.0473.151
District of Columbia100.000482.000468.00032.00024.4023.50024.90023.500
Michigan100.000509.000495.00029.00024.1023.70024.50023.800
Connecticut100.000530.000512.00031.00025.5024.60025.60024.600
Delaware100.000503.000492.00018.00024.1023.40024.80023.600
\n", "
" ], "text/plain": [ " Participation_SAT (%) EBRW_SAT Math_SAT \\\n", "sd 34.929 45.217 84.073 \n", "District of Columbia 100.000 482.000 468.000 \n", "Michigan 100.000 509.000 495.000 \n", "Connecticut 100.000 530.000 512.000 \n", "Delaware 100.000 503.000 492.000 \n", "\n", " Participation_ACT (%) English_ACT Math_ACT \\\n", "sd 31.824 2.33 1.962 \n", "District of Columbia 32.000 24.40 23.500 \n", "Michigan 29.000 24.10 23.700 \n", "Connecticut 31.000 25.50 24.600 \n", "Delaware 18.000 24.10 23.400 \n", "\n", " Reading_ACT Science_ACT \n", "sd 2.047 3.151 \n", "District of Columbia 24.900 23.500 \n", "Michigan 24.500 23.800 \n", "Connecticut 25.600 24.600 \n", "Delaware 24.800 23.600 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_total_new = df_total.copy() \n", "df_total_new = df_total_new.set_index('State').sort_values(\"Participation_SAT (%)\",ascending=False)\n", "df_total_new = pd.concat([df2,df_total_new])\n", "df_total_new = df_total_new.rename(index={df_total_new.index[0]: 'sd'})\n", "\n", "df_total_new.head()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_SAT (%)
sd34.929
District of Columbia100.000
Michigan100.000
Connecticut100.000
Delaware100.000
New Hampshire96.000
Maine95.000
Idaho93.000
Florida83.000
Massachusetts76.000
Rhode Island71.000
New Jersey70.000
Maryland69.000
New York67.000
Virginia65.000
Pennsylvania65.000
Washington64.000
Indiana63.000
Texas62.000
Georgia61.000
\n", "
" ], "text/plain": [ " Participation_SAT (%)\n", "sd 34.929\n", "District of Columbia 100.000\n", "Michigan 100.000\n", "Connecticut 100.000\n", "Delaware 100.000\n", "New Hampshire 96.000\n", "Maine 95.000\n", "Idaho 93.000\n", "Florida 83.000\n", "Massachusetts 76.000\n", "Rhode Island 71.000\n", "New Jersey 70.000\n", "Maryland 69.000\n", "New York 67.000\n", "Virginia 65.000\n", "Pennsylvania 65.000\n", "Washington 64.000\n", "Indiana 63.000\n", "Texas 62.000\n", "Georgia 61.000" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_total_new[['Participation_SAT (%)']].head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 15. Use a boolean filter to display only observations with a score above a certain threshold (e.g. only states with a participation rate above 50%)\n", "\n", "I printed out the tail to check." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_SAT (%)EBRW_SATMath_SATParticipation_ACT (%)English_ACTMath_ACTReading_ACTScience_ACT
Texas62.0513.0507.045.019.520.721.120.9
Georgia61.0535.0515.055.021.020.922.021.3
Vermont60.0562.0551.029.023.323.124.423.2
Hawaii55.0544.0541.090.017.819.219.219.3
California53.0531.0524.031.022.522.723.122.2
\n", "
" ], "text/plain": [ " Participation_SAT (%) EBRW_SAT Math_SAT Participation_ACT (%) \\\n", "Texas 62.0 513.0 507.0 45.0 \n", "Georgia 61.0 535.0 515.0 55.0 \n", "Vermont 60.0 562.0 551.0 29.0 \n", "Hawaii 55.0 544.0 541.0 90.0 \n", "California 53.0 531.0 524.0 31.0 \n", "\n", " English_ACT Math_ACT Reading_ACT Science_ACT \n", "Texas 19.5 20.7 21.1 20.9 \n", "Georgia 21.0 20.9 22.0 21.3 \n", "Vermont 23.3 23.1 24.4 23.2 \n", "Hawaii 17.8 19.2 19.2 19.3 \n", "California 22.5 22.7 23.1 22.2 " ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_total_new = df_total_new[df_total_new['Participation_SAT (%)']>50]\n", "df_total_new.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Visualize the data" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### 16. Using MatPlotLib and PyPlot, plot the distribution of the Rate columns for both SAT and ACT using histograms. (You should have two histograms. You might find [this link](https://matplotlib.org/users/pyplot_tutorial.html#working-with-multiple-figures-and-axes) helpful in organizing one plot above the other.) \n", "\n", "There are certain technical criteria for the optimal choice of number of bins which will not be used here but are briefly described in the end of the notebook. We will follow the heuristical \"not too large, not too little\" bin size criterion.\n", "\n", "There is an outlier which we will exclude now for good measure:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "df_total_new = df_total_new[df_total_new.index != 'Maryland']" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([],\n", " dtype=object)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "array([],\n", " dtype=object)" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWYAAAECCAYAAADNQ31aAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAFlxJREFUeJzt3Xu4XHV97/F3sncIxES3lUjlFKQqfKmXSi8UQY1phXJR46Vqz2OlRQ4irXiwgCBIpHJrfSReWqRQIQICagWjyCkXEZDLUUHESmr5htupnHo8IhDqJuGSnd0/1tplsq+zd/bM+iXzfj1PniezZl2+a+a3P/Nbv7Vm1pzh4WEkSeWY23QBkqRNGcySVBiDWZIKYzBLUmEMZkkqjMEsSYXpb2rDEbELcB9wV8vkOcBnMnPlNNf1BmCvzPxoRCwD9s3M/znJ/OcBX8rM66ZfOUTEYcA2mXl2RBwBDGTm38xkXVNs5wXAp4GXAsPAeuCMzPz6qPkuB14H7JyZ6+ppfwocXc+yc73sQ/XjD2TmzeNs7wvAx4E1wNeA3YBvZeb76udfDJybmfu2LPMW4BWZeeqs7PRWzDY/re2NadMtz70ROBYYAOYBq4FjM/PBiLgMeEk96yvr54aARzPz98fZzn8D/h54M/B7wIX1U0dn5j/V8ywHfpqZ57csdzFwWmbePUu7vInGgrm2PjP3GHlQv0irI+L7mfmjaaxnT+BXADLzCuCKyWbOzMNmUmyL11C94WTmOZu5rsmcB1yXmX8MEBEvBW6NiH0y81/raTsCS4DvAn8KnFPXdRFwUT3PBcDqzDxzog1FxDuBxzJzdf2H/mBmHhQRV0fEyzNzNfBJ4JjW5TLzaxHx/ojYIzN/OKt7v3WyzU9hojZdP/cu4CRgWWbeGxFzgA8DN0TEyzLz7S3zDgO/n5m/mGRznwNOzszhiDgeeA/wAHAl8E8RsTPwemB0qH8UuDQi9s7MWf8ySNPBvInM/PeIuAfYLSLuo/ok2xV4HvBL4F2ZmRFxI/AIsDvwZeAIoC8iHgPuAd6emW+MiF+lelN3BzYC52Tm39bLnwV8H/g2cDWwF1Xv5cjMvDkidgDOBXYAfhX4N+CdwKuBZcB+EbEeWAxsn5lHRsTL6vU+j6qHuyIzL4qIpcDpwP3Ay6k+5d+XmbdO8ZK8ANguIuZm5sbM/HEdmo+2zHM48C3gMuDUiDh3hg3lY8A76v8/CTwrIrYBFgBP1b2UBzPzn8dZ9nzgZOCtM9huT7PNj2uyNn06cHhm3lu/fsMR8TfAT4D5VG23LRGxF/D8zLy9nvQksBB4DvBUPW0FcNzov6nMvD8i1tavyyZHsLOhqDHmiNib6jDke8CBwNrM3DszdwNuB45smf3RzHxpZn6MqiF+OTM/MmqVZwNrMnN3YG/g8Ih4yah5dga+XfdiPgx8OSLmAf8d+E5m7g28CFgHHJyZq6h6J5/KzM+21N5fT/+7zPzNuv4z6n2C6o9gRWb+FvB54Iw2XpJj633+eUR8PSI+BNyfmT9r2eZ7gYuBb1D9QR3Qxno3EREvB7are8UA3wSeAH4I3ED1B3oSsHyCVVwLHBgR2013273ONj/m9ZiwTUfE84BdgE3CPTOHM/OSzPyPqdY/yjuoesYjTqXqoFwMHBsR+1IdRd42wfLXAm+b5jbb0nSPebuIGDn87Qd+AfxJZj4IPBgR90fEB6ga7lLgOy3LjhkjHce+wHEAmfkY1Sc3EdE6z6OZeWk9z1URMQT8ZmZ+JiJeGxFHU/VgXk71xzOR3YBtM/Or9bp+Wo+THUAdbi2H+j8ADpmq+My8vj6UehXVod2bgI9GxB/Un/JvBvqAqzNzQ0R8CfggcNVU6x5ld+Delu1uBP7r0LceYzsf2D4iVlL1fpZn5p31/I9ExBPAC4GOjLltRWzzk5usTW+s55mtDuXuwJdGHmTmj4F9AOoPqpuAZRHxP4A/Av6d6uhipFf+ANURxaxrOpg3GW9rFRF/TnVIcxZwKdVh3K+3zDLYxvo3UB1ejazzRVR/CKPnaTUXGIqIj1OdDFhJ1cjmUR32TaSvdVst65pX/399y/ThKdZFRDwf+CuqE3W3ALdQ9UbOA/6Mqjf1F8B2wL31H942wAvqsbZ/mWz9owwzQWOvPxj2pRpju5hqnPkB4IvAa1tm3UB1kkWTs81PbtI2HRFrqDoqm5zEjIh/BE6fYKhtIhO2e6oPgy/W+3A08ArgRODdVJ0UgKfpUJsvaihjlP2BC+ozoUnVW+ybYN4NPNMYWl1HNZhPRDyHatxq11HzLI6IkUOlN1G92HfV2/90Zn4B+DmwX8v2x9ve3cDTEfG2el07Un3KfrOdnR3HI/U2j6pPcBARC4AXAz+IiN2ozlr/TmbuUv/bkepT/qhpbivr9Y5nBXB83YueT7XvG6nGnqnreg6wLdU4n2aup9t8m236Y8BnRoZnIqIvIk4C9mD6R2vjtvuoroZ6C/BZqoycQxXim7R7qg/NjhwhlhzMZwLvi4gfUR3C/YBnLoMZ7Xpg/4j4u1HTjwR+o17HrcBfZ+Ydo+Z5Ajg4Iv4Z+AjwlswcAk4BzqyXvYKqxzqy/auAIyLihJGVZObTVG/mUfUy1wGnZOYNM9h3MnMD8IdU44QPRMRqqsPKK7O6tOrPgVUjJ0FanFLvz/bT2NZqYH1E/Ebr9HqMbTAzv1tPWkE1VngDm443/2FdV9snXjSunm7ztNGm6yGYM4Av1kNC/0J1OekfzKD9Xcb452Q+AXwkM4fqceuvUQ31HUB11DjiAOAr09xmW+b08s9+RnVd6erMXNh0LU2L6jKk12TmX8xg2euBD+b0LvdSA2zzm4qIa6jOl0x0gm+i5V4MXALsPcOroCbV9BhzT6uvsviTCZ7+RGZe0q1aMvPSiFgWEa/IzLumXqISEW8FbjaU1Y6S2nztfcBZEfGmaQbsacBhnQhl6PEesySVqOQxZknqSQazJBXGYJakwmz2yb+HHvpl1wapFy6cz+Dg1ntFVi/v3+LFi9r58kExutnuJ1N6m7G+iU3W5reoHnN//0TX2m8d3D9NV+mvqfXNzBYVzJLUCwxmSSqMwSxJhTGYJakwBrMkFaaty+Ui4k7gsfrhA5n5ns6VJEm9bcpgjohtATJzacerkSS11WN+JbAgIq6t5z+x5fd5JUmzrJ1gXkf1A97nUd0J4aqIiPqH3Fm4cH7XLtLu65vLwMCCMdN3XX71jNZ3z6nTvm9pR020fyWayWt+/xkHbTH7tyXYc8VN017m9mOWdKASzbZ2gnkNcG/9u6NrIuJh4AXAg0BXv844MLCAtWvXzdr6ZnNds2G29680Q0MbJ9y/xYsXdbkaqVztXJVxKNUthUbu6fVs4P91sihJ6mXt9JjPBy6IiFuobkh46MgwhiRp9k0ZzJn5FPCuLtQiScIvmEhScQxmSSqMwSxJhTGYJakwBrMkFcZglqTCGMySVBiDWZIKYzBLUmEMZkkqjMEsSYUxmCWpMAazJBXGYJakwhjMklSYdn4oX+opETEPuBDYBRgC3puZdzdalHqKPWZprIOA/szcBzgFOL3hetRjDGZprDVAf0TMpbrH5dMN16Me41CGNNYg1TDG3cD2wBtHz7Bw4Xz6+/u6XNbmGxhY0NXt9fXN7fo2p6PU+gxmaay/BK7JzBMiYifg+oh4RWY+MTLD4OCTzVW3GdauXdfV7Q0MLOj6NqejyfoWL1404XMGszTWozwzfPEIMA/Y8rrH2mIZzNJYnwJWRsTNwDbAiZn5eMM1qYcYzNIomTkIvLPpOtS7vCpDkgpjMEtSYQxmSSqMwSxJhTGYJakwBrMkFcZglqTCGMySVBiDWZIKYzBLUmEMZkkqTFu/lRERzwfuAPbzFjuS1FlT9pjr+5+dC6zvfDmSpHaGMs4EzgF+2uFaJElMMZQREYcAD2XmNRFxwnjzdPMWO7N9G5jSbilT6m1uZsvWvn/SbJlqjPlQYDgi9gX2AC6KiGWZ+bORGbp5i53Zvg1Mabe8Kf02PJtraGjjhPs32W12pF4zaTBn5pKR/0fEjcARraEsSZp9Xi4nSYVp+9ZSmbm0g3VIkmr2mCWpMAazJBXGYJakwhjMklQYg1mSCmMwS1JhDGZJKozBLEmFMZglqTAGsyQVxmCWpMIYzJJUGINZkgpjMEtSYQxmSSqMwSxJhTGYJakwBrMkFcZglqTCtH3PP6mXRMQJwDJgG+DszDy/4ZLUQ+wxS6NExFJgH+DVwOuAnRotSD3HHrM01v7AXcAq4NnAh5otR73GYJbG2h54IfBG4NeBKyJi98wcHplh4cL59Pf3NVXfjA0MLOjq9vr65nZ9m9NRan0GszTWw8DdmfkUkBHxBLAY+PnIDIODTzZV22ZZu3ZdV7c3MLCg69ucjibrW7x40YTPOcYsjXULcEBEzImIHYFnUYW11BUGszRKZl4J3AncBnwDeH9mDjVblXqJQxnSODLzuKZrUO+yxyxJhTGYJakwBrMkFcZglqTCGMySVBiDWZIKM+XlchHRB3wOCGAIeE9m3tfpwiSpV7XTY34TQGa+Gvgo8MmOViRJPW7KYM7MrwGH1w9fCPz/jlYkST2urW/+ZeaGiLgQeCvw9tbnZvtXtnZdfvWsrWsqpf2qVKm/dDVbtvb9k2ZL21/Jzsw/i4jjge9FxEsz83HYcn9lC7r/S1tTKf2XuDbX0NDGCfdvsl/aknrNlEMZEXFwfZsdgHXARqqTgJKkDminx/xV4PMRcRMwD/hgZj7R2bIkqXdNGcz1kMU7u1CLJAm/YCJJxTGYJakwBrMkFcZglqTCGMySVBiDWZIKYzBLUmEMZkkqjMEsSYUxmCWpMAazJBXGYJakwhjMklQYg1mSCmMwS1JhDGZJKozBLEmFMZglqTAGsyQVxmCWpMIYzJJUGINZkgpjMEtSYQxmSSpMf9MFSKWKiOcDdwD7ZebdTdej3mGPWRpHRMwDzgXWN12Leo/BLI3vTOAc4KdNF6Le41CGNEpEHAI8lJnXRMQJ482zcOF8+vv7ZmV7uy6/elbW046BgQVd2xZAX9/crm9zOkqtz2CWxjoUGI6IfYE9gIsiYllm/mxkhsHBJxsrbnOsXbuuq9sbGFjQ9W1OR5P1LV68aMLnDGZplMxcMvL/iLgROKI1lKVOc4xZkgpjj1maRGYubboG9R57zJJUmEl7zPW1nCuBXYD5wGmZeUUX6pKknjVVj/ndwMOZ+VrgQOCszpckSb1tqjHmrwCXtTze0MFaJElMEcyZOQgQEYuoAvqk0fNMdqF9Ny+cn4k9V9w07WXuOfWADlRSaepi9269T6VezC+VZsqrMiJiJ2AVcHZmXjr6+S31QvuZ6uTF6KVfjL+5hoY2Trh/k11sL/WaqU7+7QBcCxyZmd/qTkmS1Num6jGfCDwXWB4Ry+tpB2amv7glSR0y1RjzUcBRXapFkoRfMJGk4hjMklQYg1mSCmMwS1JhDGZJKozBLEmFMZglqTAGsyQVxmCWpMJ4aymph8zkFxVvP2bJ1DNpVtljlqTCGMySVBiDWZIKYzBLUmEMZkkqjMEsSYUxmCWpMAazJBXGYJakwhjMklQYg1mSCmMwS1JhDGZJKozBLEmFMZglqTAGsyQVxmCWpMIYzJJUGINZkgrjPf+kUSJiHrAS2AWYD5yWmVc0WpR6ij1maax3Aw9n5muBA4GzGq5HPcYeszTWV4DLWh5vaKoQ9SaDWRolMwcBImIRVUCfNHqehQvn09/f1+3SGjEwsGDGy/b1zd2s5Tut1PoMZmkcEbETsAo4OzMvHf384OCT3S+qIWvXrpvxsgMDCzZr+U5rsr7FixdN+FxbY8wRsVdE3DhbBUkli4gdgGuB4zNzZdP1qPdM2WOOiOOAg4HHO1+OVIQTgecCyyNieT3twMxc32BN6iHtDGXcB7wN+EKHa5GKkJlHAUc1XYd615TBnJmXR8QuEz3fSydBAPZccdO0l7nn1APamq/1RMSuy6+e9nams60mlHqiRSrNZp/866WTIDPV7smF2TgRUfKJlqGhjRPWN9mJEKnX+AUTSSqMwSxJhWlrKCMz/w/wqs6WIkkCe8ySVByDWZIKYzBLUmEMZkkqjMEsSYUxmCWpMAazJBXGYJakwhjMklQYg1mSCmMwS1JhDGZJKozBLEmF8S7Z0iyayR1uStfNfbr9mCXTXmYm9XVrOzPdlj1mSSqMwSxJhTGYJakwBrMkFcZglqTCGMySVBiDWZIKYzBLUmEMZkkqjMEsSYUxmCWpMAazJBXGYJakwhjMklQYg1mSCmMwS1JhDGZJKozBLEmFMZglqTBT3vMvIuYCZwOvBJ4EDsvMeztdmNQU27ya1k6P+S3Atpm5N/BhYEVnS5IaZ5tXo9oJ5tcAVwNk5neB3+1oRVLzbPNq1Jzh4eFJZ4iI84DLM/Oq+vFPgBdl5oYu1Cd1nW1eTWunx/wfwKLWZWyg2srZ5tWodoL5VuAggIh4FXBXRyuSmmebV6OmvCoDWAXsFxH/G5gDvKezJUmNs82rUVOOMTclIk4AlgHbUF269G3gAmAYWA28PzM3NlbgZoqIecCFwC7AEPBeYANbwT5GxF7AxzNzaUS8hHH2KSJOBt5Atc8fzMzbGit4C1C3l5VU7WU+cBrwf4FvAPfUs/19Zn65kQJrEXEn8Fj98AHgXOAzVO/ztZn5sQZrOwQ4pH64LbAH8C7gE8CD9fSTM/PbXS9ulHZ6zF0XEUuBfYBXAwuAY4FPAidl5o0RcQ7wZqqezZbqIKA/M/eJiP2A04F5bOH7GBHHAQcDj9eTxrxvEfFvwOuAvYCdgMuBPZuodwvybuDhzDw4Ip4H3AmcAnwyM4u4nC8itgXIzKUt034I/BFwP/C/IuK3M/MHTdSXmRdQdRKIiM9SfdD9NnBcZl7eRE0TKfWbf/tTjeutouoRXAn8DlWvGeAqYN9mSps1a4D++ssMzwaeZuvYx/uAt7U8Hm+fXkPVexrOzJ9QvQ6Lu1vmFucrwPKWxxuoXts3RMRNEXF+RCwaf9GueSWwICKujYjrI2IJMD8z78vMYeAa4PXNlggR8bvAyzLzH6hew0Mj4uaIWBERRXRWSw3m7amuHX0HcARwCdWZ8ZFxl18Cz2mottkySHVYejfwOeBvgTlb+j7WPY+nWyaNt0/P5pnD3dbpmkBmDmbmL+vwvQw4CbgN+FBmLqHqkZ7cZI3AOuBMqo7VEcDn62kjSnmfTwRGhlS+CXwAWAIspKq7caUG88PANZn5VGYm8ASbvqGLgLWNVDZ7/pJqH3ej6mlcSDWePmJr2EeA1jHykX0afTna1rKvHRUROwE3AF/IzEuBVZl5R/30KuC3Giuusga4uD4SWkP14fsrLc83/j5HxACwe2beUE9amZn3152Hr9P8awiUG8y3AAdExJyI2BF4FvCteuwZ4EDg5qaKmyWP8kyv8RGq8eU7t7J9hPH36VZg/4iYGxE7Ux0N/aKpArcEEbEDcC1wfGaurCdfExG/V///9cAd4y7cPYdSf329/rtdADweES+OiDlUPemm2/QS4DqAuqYfRcSv1c+V8BoChZ78y8wr6/Gp26g+PN5PdYb3cxGxDfCvVIdzW7JPASsj4maqnvKJwPfZuvYR4BhG7VNmDtX7/R2eeX81uROB5wLLI2JkrPlo4NMR8RTwM+DwpoqrnQ9cEBG3UF2FcyjVEdMlQB/VeYXvNVgfQFAN+5CZwxFxGPDViFgP/JhqWLFxxV4uJ0m9qtShDEnqWQazJBXGYJakwhjMklQYg1mSCmMwS1JhDGZJKozBLEmF+U82t+M4POg3qgAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "p1 = df_total_new[['Participation_SAT (%)']]\n", "p2 = df_total_new[['Participation_ACT (%)']]\n", "\n", "fig, axes = plt.subplots(1, 2)\n", "p1.hist('Participation_SAT (%)', bins=10, ax=axes[0])\n", "p2.hist('Participation_ACT (%)', bins=10, ax=axes[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 17. Plot the Math(s) distributions from both data sets." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([],\n", " dtype=object)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "array([],\n", " dtype=object)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWYAAAECCAYAAADNQ31aAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAEYZJREFUeJzt3XuQZGV5x/Hv7sy6uJnVMWZRKC9EJU8KTMAYKqAR1wgGYrwREijlongjiQY0RJSAN4ixIkSIlkKpiAiICiKgJZIYETQFVhSqROXhqlKJKKKrrLus7jL545yJzToz3Tsz3eeZ5fup2trpc073eebtp3/z9jmnZ5ZNTU0hSapjedcFSJIeyGCWpGIMZkkqxmCWpGIMZkkqxmCWpGIMZkkqZrzrAroSEbsAdwBXZ+Yzt1p3DnAksCYzfzTL/fcCXp6ZR0fEWuC9mfnkbaxhJ+B0YDdgCtgIvCMzL91qu4uBZwKPy8wN7bIjgNe3mzyuve/d7e3XZuY121KLtn8Ver7nsX6tp3vW/TlwHDAJrABuBI7LzDsj4iLgSe2me7TrtgA/ycxnzaeWih60wdy6D4iIeHxmfpfmxm8ATx/gvrsDj1ng/j8I/EdmHtLuezfgKxHxtMz8drtsZ2Bf4FrgCOBMgMw8Fzi33eYc4MbMPHWB9Wj713XPz9rT7boXAycCz8/MWyNiGfBG4IsRsXtmHtyz7RTwrNl+kCxlD/Zg3gJ8HHgJ8I522UHApcDfA8sj4gxgb2A1sAx4BfA94O3AwyPiw8BHgImIuBD4XWAH4JUDzFp3Ah4aEcsz8/7M/FZEPB/4Sc82rwK+AFwEnBwRZ2WmH9fUfHXd8zB3T/8T8KrMvBUgM6ci4p3t/lcCmxbyzS8VHmNuZp2H99w+Ejin/TqAnYF9MnM3mmZ8Y2beCbwZuCYzX9Zu+xjg3Zm5J3AW8NYB9n0c8BrghxFxaUT8A3B7Zt4FEBHjwCuB84DLgUcBB8zz+5Smddbzc/V0RDwS2AX4Su99MnMqM8/PzJ9t6ze6VD3ogzkzvwZsiYinRsRjgdWZeWO7+ts0b6teHRGnAgcDE7M81G2ZeV379Q3AjgPs+z9pjg+/ELgOeB5wU3ssD+AFwBhwRWZuAi4Ejt3W71Hq1WXPM3dP39/+/6DPpQf9ALQ+ChxGM4v4aM/y5wKfbb++lOZY2LJZHuOXPV9PzbEdABGxY0S8D5jKzC9n5jsyc1+at5lHtpv9DfBQ4NaI+A5NgD8nInYf8PuSZjPynm/N2tOZ+RPgZprDKA8QEZ+IiD0GePztgsHcOA/4S+AQ4IKe5XsBl2fm+4H/pmmisXbdZpozxvP1Y2B/4Jj2BAcRsQp4IvD1iPgdmrPWT83MXdp/OwNXA8csYL8SdNDzA/b024AzIuJJ7X3GIuJEYE/gpvnue6kxmIHM/B+at3C3ZOaPe1ZdCKyNiG8AXwduA347IpbTnFF+QkR8ap773Aw8B9gHuCMibqQ5nPGZzDwb+GvgkumTID3eDhweEb81n/1K0E3PM0BPZ+YFNCclPxYRNwDfpLmc9E/aQx8PCsv8fcySVMuD/XK5oWqvsnjJLKvflZnnj7Ieadjs+cXhjFmSivEYsyQVYzBLUjELPsZ89933LsqxkImJlaxfX/ekq/UtTL/61qxZPcg1sGUsVt/Ppvrz2cta52euni8zYx4fH+u/UYesb2Gq11fNUhova118ZYJZktQwmCWpGINZkooxmCWpGINZkooZ6HK5iLge+Gl7846eX5QtbZfseXWpbzBHxA4Ambl26NVIBdjz6togM+Y9gFURcWW7/QmZee1wy5I6Zc+rU31/iVFE/B7NXxT4ILAr8Dkg2t8nzMaNv5hajIu2x8aWs2XL/f037Mio69v1pCvmdb9bTq75JwH7jd+KFWNlPvnXr+dh8fp+NtVfD70Ws9b59v22qvA6mavnB5kx3wzc2v4V25sj4h6av+58J7BoH2+cnFzFunUbFuWxhqF6fdOq1thv/NasWT3Cavqas+dh8fp+Nkul32Bp1TqtQr1z9fwgV2UcBZwGEBE7Aw8Dvr8olUk12fPq1CAz5g8B50TEl2n+4OJRvW/ppO2QPa9O9Q3mzPwF8OIR1CKVYM+ra37ARJKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKMZglqRiDWZKKGR9ko4jYEfgasH9m3jTckqQa7Ht1pe+MOSJWAGcBG4dfjlSDfa8uDXIo41TgTOB/h1yLVIl9r87MGcwR8VLg7sz8/GjKkbpn36tr/Y4xHwVMRcR+wJ7AuRHx/My8a3qDiYmVjI+PLbiQsbHlTE6uWvDjbItdT7pim+9zy8kHDKGSxTPqMRxUF8/vAoys72ezlMZrKdU6rXq9cwZzZu47/XVEXAUc3ducAOvXb1qUQiYnV7Fu3YZFeaxhql5j1fr6Pb9r1qweYTVzG2Xfz2apvB5gadU6rUK9c/W8l8tJUjEDXS4HkJlrh1iHVJJ9ry44Y5akYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSpmvN8GETEGfAAIYAvwssy8bdiFSV2x59W1QWbMzwPIzKcDbwb+dagVSd2z59WpvsGcmZ8GXtXefDzwg6FWJHXMnlfX+h7KAMjMzRHxEeBFwMG96yYmVjI+PrbgQsbGljM5uWrBjzNs1Wvc67Srt/k+t5x8wBAqeaCl8vxOm6vnYfH6fjZLabyWUq3Tqtc7UDADZOaREXE8cF1E7JaZPwdYv37TohQyObmKdes2LMpjDdNSqHFbjeJ76vf8rlmzeug1bKvZeh4Wr+9ns1ReD7C0ap1Wod65er7voYyIODwi3tTe3ADcT3NCRNou2fPq2iAz5k8BH46Iq4EVwLGZed9wy5I6Zc+rU32DuX379lcjqEUqwZ5X1/yAiSQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVYzBLUjEGsyQVMz7XyohYAZwN7AKsBE7JzMtGUJfUGfteXes3Yz4MuCcznwEcCLx3+CVJnbPv1ak5Z8zAJ4GLem5vHmItUhX2vTo1ZzBn5nqAiFhN06gnbr3NxMRKxsfHFlzI2NhyJidXLfhxhm0+Ne560hVDqGTx7HXa1SPZzy0nHzCS/SzUQvp+Ps/1TOMyyOthsfbVT/X+nY9Rvo7nM+b9ZsxExGOBS4D3ZeYFW69fv37TNu90JpOTq1i3bsOiPNYwLYUaq5pr7NasWT3CSvobVd/DzOMyrNeD/dsY5TjMtq+5er7fyb9HAVcCr8nMLyykOGmpsO/VtX4z5hOARwAnRcRJ7bIDM3PjcMuSOmXfq1P9jjEfAxwzolqkEux7dc0PmEhSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBUzUDBHxB9FxFVDrkUqxb5XV8b7bRARbwAOB34+/HKkGux7dWmQGfNtwEHDLkQqxr5XZ/rOmDPz4ojYZbb1ExMrGR8fm3Hdriddsc0F3XLyAdt8n/nsZ772Ou3qke1rezM5uarrEga2kL7fVjONy9jY8qGM11J6DoZplK/j+Yx532DuZ/36TQt9iAdYt27Doj6e6pjruV2zZvUIK1m4xez7mcZlcnLVUF4Lvr5Gb7Yxn6vnvSpDkooxmCWpmIEOZWTmd4C9h1uKVIt9r644Y5akYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYgxmSSrGYJakYsb7bRARy4H3AXsAm4BXZOatwy5M6oo9r64NMmN+IbBDZu4DvBE4bbglSZ2z59WpQYL5j4ErADLzWuAPh1qR1D17Xp1aNjU1NecGEfFB4OLM/Fx7+3vAEzJz8wjqk0bOnlfXBpkx/wxY3XsfG1TbOXtenRokmL8C/BlAROwNfGOoFUnds+fVqb5XZQCXAPtHxH8By4CXDbckqXP2vDrV9xjzYouIHYGvAfsDbwUe3a7aBbg2Mw+NiMuARwK/BDZm5oEjqu164KftzTuAs4AzgM3AlZn5ti4vpZqhvvOAU2jG6YfAEZm5odD4XQ68C7izXfYW4Bq8FO3/RcQK4Gya/l9J83x+CzgHmAJuBP42M+/vqMQHmKXe7wHvAbbQPKdHZOYPuqpx2ky1ZuZl7boXA69tr7wpZ5AZ86JpB+osYCNAZh7aLn8E8EXgde2mTwJ2z8yR/dSIiB3amtb2LLsB+AvgduCzEfEHNE/yDpm5T/s29zTgBR3Vl8C+mfmDiPhn4BXAv1Fn/E4B3pCZF/csO4gOxq+ww4B7MvPwiHgkcD1wA3BiZl4VEWfSjM8lXRbZY6Z676AJuRsi4tXA8cDruyyyNVOtl0XEnsDLad4NlTTSYAZOBc4E3rTV8rcB78nM70fEo4BJ4PKImATemZmfGUFtewCrIuJKmnF5K7AyM28DiIjPA88GdqLnUqqIGNWlVFvXdwKwtmdmMg7cV2j8TgCeCjwlIo4Fvkrzgn3ApWgjHL+qPglc1HN7M824fam9/TngOdQJ5pnqPTQzv9/eHgfuG3lVM/u1WtuAfidwLPCBTqoawMg+kh0RLwXuzszPb7V8R5rAO6dd9BCaWdQLgYOAd7fbDNsGmh8cfwocDXy4XTbtXuDhwMP41dt1gC0RMYofcFvXdz5wN0BEvAh4FnAudcbvfJp3Qa8F9gUm2uVdjV9Jmbk+M++NiNU0IXIisKzn3c5035UwU73ToRwRTwNeA7y7yxqnzVDrScCHaN6Z39tpcX2M8ndlHEVzQuUqYE/g3Ih4NHAwcEFmbmm3uws4MzM3Z+YPad5+xAjquxk4LzOnMvNmmvD4zZ71q4F1dHcp1db13QPsFBGvA44DDsjM+6gzfvcAH8vM29uQuRR4Cl6K9msi4rE0P8Q+mpkXAL3Hk6f7rowZ6iUiDqF5N/zczLy7y/p69dYK3ALsCrwfuBDYLSJO77C8WY0smDNz38x8ZnsM8gaaEwR3AfvRvF2bth/wCYCImACeDHx7BCUeRfvR24jYGVgF/DwinhgRy2hmgtfQ3aVUW9f3sHbZM4D9MvNH7XZVxu/hwHUR8Zh2/bNpTvp6KVqP9tDTlcDxmXl2u/j6iFjbfn0gTd+VMFO9EXEYzUx5bWbe3mV9vbauNTO/mpm7txl0KPCtzDy20yJnMfKrMgDaWfPRmXlTRHwTeHpmrutZfzqwN83M4V8y89MjqOkhNIdTHkdzNvz4dv+nA2M0V2X8Y89VGb9PeylVZt7UQX0nAv8OfJ1fHdP7eGa+v9D4TdCctd9Ic6XB39GcuR/5+FUVEWcAhwC9Y3AMzUnch9D8UH1lzzvKTs1Q7xjND//v8quZ/Zcy8y0dlPcAs4ztgZm5MSJ2AS7MzL07Ka6PToJZkjQ7fx+zJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMQazJBVjMEtSMf8HE/Z8pigdr+4AAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "p1 = df_total_new[['Math_SAT']]\n", "p2 = df_total_new[['Math_ACT']]\n", "\n", "fig, axes = plt.subplots(1, 2)\n", "\n", "p1.hist('Math_SAT', bins=10, ax=axes[0])\n", "p2.hist('Math_ACT', bins=10, ax=axes[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 18. Plot the Verbal distributions from both data sets.\n", "\n", "In the ACT scores I joined English and reading into one Verbal section following SAT's strategy:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_SAT (%)EBRW_SATMath_SATParticipation_ACT (%)English_ACTMath_ACTReading_ACTScience_ACTVerbal_ACT
District of Columbia100.0482.0468.032.024.423.524.923.524.65
Michigan100.0509.0495.029.024.123.724.523.824.30
Connecticut100.0530.0512.031.025.524.625.624.625.55
Delaware100.0503.0492.018.024.123.424.823.624.45
New Hampshire96.0532.0520.018.025.425.126.024.925.70
\n", "
" ], "text/plain": [ " Participation_SAT (%) EBRW_SAT Math_SAT \\\n", "District of Columbia 100.0 482.0 468.0 \n", "Michigan 100.0 509.0 495.0 \n", "Connecticut 100.0 530.0 512.0 \n", "Delaware 100.0 503.0 492.0 \n", "New Hampshire 96.0 532.0 520.0 \n", "\n", " Participation_ACT (%) English_ACT Math_ACT \\\n", "District of Columbia 32.0 24.4 23.5 \n", "Michigan 29.0 24.1 23.7 \n", "Connecticut 31.0 25.5 24.6 \n", "Delaware 18.0 24.1 23.4 \n", "New Hampshire 18.0 25.4 25.1 \n", "\n", " Reading_ACT Science_ACT Verbal_ACT \n", "District of Columbia 24.9 23.5 24.65 \n", "Michigan 24.5 23.8 24.30 \n", "Connecticut 25.6 24.6 25.55 \n", "Delaware 24.8 23.6 24.45 \n", "New Hampshire 26.0 24.9 25.70 " ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_total_new['Verbal_ACT'] = (df_total_new['English_ACT'] + df_total_new['Reading_ACT'])/2\n", "df_total_new.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next I change syntax and disposition of plots to make the x-axis labels more legible" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[],\n", " []],\n", " dtype=object)" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWkAAAECCAYAAAA8SCbXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAF1ZJREFUeJzt3Xu0XHV99/F3LhiMiRwuwUeXYNqK37ZeuItYE0MFuRXwwtOHhVSFAt6w0CIKSB6t0FZb462KoJTiBcTKZT2AhIsYC4IIC7koly8gKHQhGqiBpIFIwnn+2PvEYZhzZk5yZvbvhPdrrayc2bPnzGf2zPmc3/nN3numDA8PI0kq09SmA0iSRmdJS1LBLGlJKpglLUkFs6QlqWCWtCQVzJKWpIJNbzqANmwRMQz8DFjTdtVb6v9/Dvy0/noasBL4u8y8tsPth4GZwOPA+4BbgUeBeZl5a73+e4DTgD0z84p62V8Cx2bmLl2yvhv4INXPxXTgR/XtHmtZ59XAbcDxmfmpluXX1dmeB0TLY7o9M98x5kaSxmBJaxB2y8xH2hdGxFzgiczcrmXZXwJnAduMdvuI+BDwr5m5a0RcBexGVdgAfwFcDBwAXFEv+3Pgu2MFjIidgf8L7JSZ/x0R04AvAV8GDm5Z9f3A2cBREbEoM1cDZObrWx7Tz1ofk7Q+nO5QaTYHfjXalRExHdga+O960WJgQX3d84FdgOOB/Vpu9ia6lDTwYqqfh5kAmbmGqrS/2nLfs4F3AP8APAYc2NtDktadI2kNwpKIaJ3uuD8z31p//fyIuKX+elOqsjygw+2HgS2AJ4FLgEPr6xYD/xQRU4HdgR9m5h0RsTIitgceAWYBP+mScTFwLfCLiLgNuK5edmnLOn8F3J2Zd0bE14C/Bc7t4fFL68yRtAZht8zcruXfW1uue6Jl+cuAfYDvRMQftN3+NVRTGTOBJZn5G4DMfJBq5P0aqtHzJfVtLgHeTDWKvjQzxzxJTWY+Vc8dbw0soppb/hrPLOH31ssAvgnsGBG7jm9TSONjSasomfk94F7gtR2u+wnV6PWseu53xGLgjVQFPzKt8V3gDfQwHw0QEYdFxP6Z+VBmnp2ZRwI7AP87IraIiHnAq4APR8QvqN5U/F2dR+obS1pFiYhXAHOBmztdn5nfAm4APtuyeDFwGPBQZv66XnYNVam+Hriyh7t+GvhURLy0ZdkrgV8Cv6Xam+QbmblVZs7NzLlUI/u3RcTWvT06afyck9YgtM9JA5wI3MEz56ShGjgcmZl3j/H9jgJui4g9M/NyqkL+A+DTIytk5uqIuBHYPDOXdwuYmWdFxEzg0oiYQbW7393AnsBmwNuAndpu8/2I+BHVbnvHdbsPaV1M8XzSklQuR9J6Tqh3n7tmlKuXZ+a8QeaReuVIWpIK5huHklQwS1qSCrbec9JLly4fc75k1qwZrFixan3vZsKZa3zMNT7mGp/nYq45c2ZP6WW9vo+kp0+f1u+7WCfmGh9zjY+5xsdco3O6Q5IKZklLUsHcT1rqg20WXtZ0hIG78dj5TUfYIDmSlqSCWdKSVDBLWpIKZklLUsEsaUkqmCUtSQWzpCWpYJa0JBXMkpakglnSklQwS1qSCmZJS1LBLGlJKpglLUkF6+lUpRGxJXATsEdm3tXfSJKkEV1H0hGxEXA68ET/40iSWvUy3fFp4DTgoT5nkSS1GXO6IyLeDSzNzMsj4oRO68yaNWPMD2ucNm0qQ0Mz1ytkPzwXczX1aSH3nLxX3753qc/jc9H6PA+lPo8l5Oo2J30YMBwRuwPbAV+PiP0z8+GRFbp93PnQ0EyWLVu53kEnmrkGp5+PZ0PcXpPV+jwPpT6P/cw1Z87sntYbs6Qzc+2HlkXED4D3tha0JKm/3AVPkgrW86eFZ+aCPuaQJHXgSFqSCmZJS1LBLGlJKpglLUkFs6QlqWCWtCQVzJKWpIJZ0pJUMEtakgpmSUtSwSxpSSqYJS1JBbOkJalgPZ8FT5JKtPOiqxu77xuPnd99pfXkSFqSCmZJS1LBLGlJKpglLUkFs6QlqWCWtCQVzJKWpIJZ0pJUMEtakgpmSUtSwSxpSSqYJS1JBbOkJalglrQkFWzMU5VGxEbAmcBcYAZwSmZeNIBckiS6j6QPAR7NzHnA3sAX+x9JkjSi20n/vwOc13J5dR+zSJLajFnSmbkCICJmU5X1Se3rzJo1g+nTp436PaZNm8rQ0Mz1jDnxzDU4/Xw8G+L2mqzW53mYrM/jIDJ3/fisiNgKuBA4NTPPab9+xYpVY95+aGgmy5atXOeA/WKuwenn49kQt9dktT7Pw2R9Htcn85w5s3tar9sbhy8CrgCOysyr1jmNJGmddBtJnwhsCiyMiIX1sr0z84n+xpIkQfc56aOBoweURZLUxoNZJKlglrQkFcySlqSCWdKSVDBLWpIKZklLUsEsaUkqmCUtSQWzpCWpYJa0JBXMkpakglnSklQwS1qSCtb1pP/9tvOiqxu533tO3quR+23q8TbpufiYpYniSFqSCmZJS1LBLGlJKpglLUkFs6QlqWCWtCQVzJKWpIJZ0pJUMEtakgpmSUtSwSxpSSqYJS1JBbOkJalglrQkFazrqUojYipwKrAtsAo4PDPv7XcwSVJvI+m3ABtn5q7A8cCi/kaSJI3opaTfAFwGkJnXAzv1NZEkaa0pw8PDY64QEWcA52fm4vryA8AfZubqAeSTpOe0XkbSjwOzW29jQUvSYPRS0tcC+wBExOuAn/Y1kSRprV4+iPZCYI+IuA6YAhza30iSpBFd56R7FRFbAjcBewAbA6cBq4G7qXbbezoijgDeUy8/JTMvmZA77z3XzDrXKuAW4Og618eAfetcx2TmDX3OdDPwWH3xfuB04PP1/V+RmX/fxK6P7bky89CImAZ8GzgjMy+r12t6e30TOAV4CvgN8M7MXFlArjOBTwPDwKWZ+Yl6vUZzZeah9fKPAq/OzINKyAVcDPwL8GC97GPANTT8ugf+gaonnldnOCgzHx309hrRy0i6q4jYiKponqgXfQz4RGZeGhFnA/tGxI3A31DtHbIx8MOIuDIzV01Ehh5zfQX4m8y8LiJOAQ6OiDuANwK7AFsB5wM79zHTxgCZuaBl2S3A24H7gO9GxA7AXOpdH+tppkXAAQPO9UfA16i2yxn1sh1ofnslMD8zfx0R/wQcHhE/LCDXTcCBmXl/RCyJiIup/vpsNFe9fG9gb+C/6sslPI+nAB/OzPNblr2N5l/33wdOzMzrI+LtwCsiYhUD3F6tJqSkqUYPpwEn1JdvBjaLiClUbzo+BbwWuLYu5VURcS/wGuDGCcrQS66XZuZ19dfXUj35m1GNXoeBByJiekTMycylfcq0LTAzIq6g2v4fB2Zk5s8BIuJy4E3Ai2nZ9TEi+r3rY3uuE6l+uR0BfKRlvTfQ7PY6EViQmb+ur58OPFlIrl0yc3VEzAI2AR6lOs6g6VyPUP0F+3Hg8Hq9ErbXjsD2EXEMcAPV6+wZu/w28Lr/KLAlsF9EfJKqn44HPsBgt9da631YeES8G1iamZe3LL4H+AJwJ/Ai4AfAC/n9nxQAy6leyH0xSq77IuKN9df7AS8YdC5gJdUvjz2B9wL/Xi9rv//2XGsiYqJ+qfaS62zg9sy8s229prfX2cBSgIh4K7Ab8PVCco28uf4z4OE6Zwm5Tuf304wjSsi1BPggMB+YVS9v+nV/DvBK4HtUr63NgHd1yNXv7bXWRDz4w4DhiNgd2I7qB2Y7YPvMvD0iPkD1J8vlPHNXvtnAsgm4//HkOg44ISI+TPUbchXP3sWw37nuBu6tfyPfHRGPUb0Q2u9/JoPd9bE916NUo/kH29Zrens9Crw4Ig4EDgT2yswnI6KIXPUBX3PrP+WPpxpNN5lrDdXU2beBIeAlEXE8ZTyP38rMBwEi4v9RTfk9RrOv+0eAuZm5pM51CdX7WXcx2O211nqPpDNzfma+sZ7TuQV4J9Xc6uP1Kg8Bm1L9OTMvIjaOiE2AP6EacfTFKLl2BA7LzH2BzYErqaY99oyIqRGxNdWL4pF+5aL65bEIICJeQlXG/xMRf1RPD+1J9ebJoHd9bM/1QuBXHdZrenu9sF42D9i95b6bzrUJ8B8RsWl9/XLg6QJyPQ1E/XNwDPD9zPxkAbk2AX4cES+tr38T1Rv8Tb/uZwM3RcS8+vr5wO0Mfnut1a8/Iw4Hzo2I1cDvgCMy8+GI+AJVAU0FPpqZT/bp/kdzD3BpRKwElmTmpQARcQ3wozrXB/qc4d+As+o3uoapXiRPU/35N41q3uvH9Rutg9z18Vm5Oo1gMvOmhrfXe6h+uf4EWBwRAN/OzC83nOtQYE6daRXVL7jDM3NF06+vQp/HQ6mmOC6IiCeAO4CvAmto+HUP/A/wpXqa5X7gI5n5uwFvr7UmbBc8SdLE83zSklQwS1qSCmZJS1LBLGlJKlg/dxLXc1xELAEur3f5al1+LNVh3T0d7hsRHwe2yMyjxnHfC4AvZuarelz/fKrDfrfOzJVt1/0F8CGq/Yw3otp19EOZ+WBEnAe8vF512/q6NcBvM3O3XvNKo7Gk1U+nUp2s5pNty4+gOo9LEer9Y+cD11PtT39ay3UHAycB+2fmvfW+7McDSyLilZl5YMu6w8Bug9p/Vs8NlrT66ULgcxExLzOvAagPy58CXBkR+1EV4POoDs/9UGb+qB457wq8BLgVuBf4k4i4murozJuB92fm8nqUe2L9PbYEvpaZC8eZ80jgKuA84OSIOL0+Ag2qXzJHjpyJLTOH63M6PADMoDpqVeob56TVN/VBFF8F/rpl8ZFUI+yXA/8I7JOZ29fLL4iIF9TrvYzq1AKH1JdfTnXY8KupSv6kelR7LPCuzNwJeB3VYf9b9JqxPmDhCKrTn15Mda6ZverrNqc6pPratsc1nJlnZ+bjSH1mSavfvgK8JSJmR8RmVIe9n0V1PoQXA1dFdarWs6mOvByZ372+7Ui5CzJzaT3C/Xdgj/rr/YAdozrX72eoCvwF9O4AqiM9L6vP0Hgu1eHT1HnAnxM1yBef+iozH6I6jPsgqvne8zLzMapivCoztxv5RzUSHjmfy4q2b7Wm5eupwFP1qPtmYAeqw8SPozot7pRxRHw/8Hzg3oj4BdWpRd9czzf/luoEPK9rv1FE/EdEbDuO+5HWiSWtQfgS8A6qUz5+qV52FVUZ/jFAROwD3EZVmJ3sHxGbRvVJMUcAi4FtqE64dFJmXgwsoJonntZLqIh4BdUeHTtm5tz630uAq4Gj69X+Hvh8RLy8vs20iDiJ6syKd/X4+KV1Zkmr7zLzB1RnHXw8M39aL7uDah763Ii4FTiZag+K9hH0iDuAS6jOiraMao+R2+pld0XEnVRTH3fw+ymTbt4HXNjh45k+AfxVRGyRmedQzZ1/q56WuR34U+DP+/mpQtIIT7AkSQVzFzxt0CLiOKqplk7+JTPPHmQeabwcSUtSwZyTlqSCWdKSVLD1npNeunR5cfMls2bNYMWK8t94N+fEmiw5YfJkNefEG8k6Z87snvbn3yBH0tOn97SbbOPMObEmS06YPFnNOfHGm3WDLGlJ2lBY0pJUMPeTljYwOy+6uukIA3XjsfObjtBXjqQlqWCWtCQVzJKWpIJZ0pJUMEtakgpmSUtSwSxpSSqYJS1JBbOkJalglrQkFcySlqSCWdKSVDBLWpIKZklLUsF6OlVpRGwJ3ATskZl39TeSJGlE15F0RGwEnA480f84kqRWvUx3fBo4DXioz1kkSW3GnO6IiHcDSzPz8og4odM6s2bNKO5DIKdNm8rQ0MymY3Rlzok1WXLC5MpauqGhmZNqe443a7c56cOA4YjYHdgO+HpE7J+ZD4+sUOLHqA8NzWTZspVNx+jKnBNrsuSEyZW1dMuWrZxU23Mk65w5s3taf8ySzsy1Hx4WET8A3tta0JKk/nIXPEkqWM+fFp6ZC/qYQ5LUgSNpSSqYJS1JBbOkJalglrQkFcySlqSCWdKSVDBLWpIKZklLUsEsaUkqmCUtSQWzpCWpYJa0JBXMkpakgvV8FjxpMtp50dWN3O+Nx87vvpImRFPPMQzmeXYkLUkFs6QlqWCWtCQVzJKWpIJZ0pJUMEtakgpmSUtSwSxpSSqYJS1JBbOkJalglrQkFcySlqSCWdKSVDBLWpIKNuapSiNiI+BMYC4wAzglMy8aQC5JEt1H0ocAj2bmPGBv4Iv9jyRJGtHtpP/fAc5ruby6j1kkSW3GLOnMXAEQEbOpyvqk9nVmzZrB9OnT+pNuHU2bNpWhoZlNx+hom4WXNXbf95y81zrdbn23Z5OPuSndtlfJr1H1bl2ew/E+910/PisitgIuBE7NzHPar1+xYtW4Ag7C0NBMli1b2XSM4qzrNnF7jl+37eU23TCsy3M48tzPmTO7p/W7vXH4IuAK4KjMvGrcaSRJ66XbSPpEYFNgYUQsrJftnZlP9DeWJAm6z0kfDRw9oCySpDYezCJJBbOkJalglrQkFcySlqSCWdKSVDBLWpIKZklLUsEsaUkqmCUtSQWzpCWpYJa0JBXMkpakglnSklSwrif977edF13dyP3eeOz8Ru5XksbDkbQkFcySlqSCWdKSVDBLWpIKZklLUsEsaUkqmCUtSQWzpCWpYJa0JBXMkpakglnSklQwS1qSCmZJS1LBLGlJKljXU5VGxFTgVGBbYBVweGbe2+9gkqTeRtJvATbOzF2B44FF/Y0kSRrRS0m/AbgMIDOvB3bqayJJ0lpThoeHx1whIs4Azs/MxfXlB4A/zMzVA8gnSc9pvYykHwdmt97GgpakweilpK8F9gGIiNcBP+1rIknSWr18EO2FwB4RcR0wBTi0v5EkSSO6zklPBhGxC/CpzFwQETsAp1HtLngLcHRmPt1oQCAiNgLOBOYCM4BTgDuAs4Bh4GfAB5rO2ilnZl5UX/dZIDPztOYSVkbZng8A/wqsoXr+35mZv24qI4ya817gK1SDnluBD2bmmqYyjujy3B9MlXPX5hJWRtmm/wVcDNxTr/blzPx2IwFro+S8HvgqsCkwjeo1+vOxvs+kP5glIj4MnAFsXC/6CnBMZs4DHgMObipbm0OAR+tcewNfBD4DnFQvmwIc0GC+Ec/KGRFzImIxsH+z0Z6h0/b8PFWRLAAuAD7SXLy1OuX8R+DEzPwzYCblbNdOWYmI7YC/pnqNlqBTzh2Az2TmgvpfowVd65Tzn4GzM3M+cBLwx92+SS/THaX7OfA24Bv15Zdm5nX119dSFd83mwjW5jvAeS2XVwM7Av9ZX14MvJlqeqlJnXLOAj5O9UIrRaecB2Xmr+rL04EnB57q2TrlfHtmromI5wH/C2h0tN/iWVkjYnPgk8AxVCPAEoz2sxQRcQDVaPqYzFzeRLgWnXL+GXBbRHwP+AVwdLdvMulH0pl5PvBUy6L7IuKN9df7AS8YfKpny8wVmbk8ImZTPXEnAVMyc2S+aTmwSWMBa51yZub9mfnjprO1GiXnrwAi4vXAUcBnm8wIo+ZcExEvA24HtgCy0ZC1DlkXAv8G/C3V67MIo/ws3QAcV49Q7wM+1mRGGDXnXOC3mbk71fRc17/2Jn1Jd3AocEJEfBf4DfBIw3nWioitgCXANzLzHKB1/nk2sKyRYG065CxSp5wR8X+o3pPYNzOXNplvRKecmfnLzNyGKutnmszXqjUr1Yh0G+DLwLnAn0bE5xqMt1aHbXphZt5UX30hsH1j4Vp0yPkocFF99cX0cHDghljS+wKHZea+wObAlQ3nASAiXgRcAXwkM8+sF98cEQvqr/cGrmkiW6tRchanU86IOIRqBL0gM+9rMt+IUXJeFBHb1Kss55m/rBvTnjUzb8jMV9Zz/AcBd2TmMY2GZNTX6OUR8dr66zcBN3W88QCNkvOH1Ls0A/Op/poa04YwJ93uHuDSiFgJLMnMS5sOVDuR6h3dhRGxsF52NPCFem7yTp45f9WUTjn3zswnGszUSXvOacCrgF8CF0QEwH9mZtN/9nbanh8FzoqI3wErgcObCtdmsj73AH8HfK7epg8DRzYVrkWnnO8CzoiI99Hjjg0bxC54krSh2hCnOyRpg2FJS1LBLGlJKpglLUkFs6QlqWCWtCQVzJKWpIJZ0pJUsP8POuK9Fpc1I5QAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "columns = ['EBRW_SAT','Verbal_ACT']\n", "df_total_new.hist(column=columns, layout=(2,1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do other plots using seaborn as well." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Participation_SAT (%)', 'EBRW_SAT', 'Math_SAT', 'Participation_ACT (%)', 'English_ACT', 'Math_ACT', 'Reading_ACT', 'Science_ACT', 'Verbal_ACT']\n" ] } ], "source": [ "print( df_total_new.columns.tolist())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 1) New ones:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.pairplot(df_total_new[['Participation_SAT (%)', 'EBRW_SAT', 'Math_SAT']], size=5)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.pairplot(df_total_new[['Participation_ACT (%)', 'Math_ACT', 'Science_ACT','Verbal_ACT',]], size=5)\n", "# point are too small though (don't know yet how to increase them)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 2) Old ones:\n", "\n", "I plotted just for the SAT to avoid excess repetition." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Text(0.5,1,'Participation_SAT (%)')" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,1,'SAT Math')" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,1,'SAT Evidence-Based Reading and Writing')" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "\n", "fig, axs = plt.subplots(1, 3, figsize=(12,6))\n", "\n", "axs[0] = sns.distplot(df_total_new['Participation_SAT (%)'], kde=False, color='blue', ax=axs[0], bins=10)\n", "axs[1] = sns.distplot(df_total_new['Math_SAT'], kde=False, color='red', ax=axs[1], bins=10)\n", "axs[2] = sns.distplot(df_total_new['EBRW_SAT'], kde=False, color='green', ax=axs[2], bins=10)\n", "\n", "\n", "axs[0].set_title('Participation_SAT (%)', fontsize=8)\n", "axs[1].set_title('SAT Math', fontsize=8)\n", "axs[2].set_title('SAT Evidence-Based Reading and Writing', fontsize=8)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 19. When we make assumptions about how data are distributed, what is the most common assumption?" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "That they are normally distributed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 20. Does this assumption hold true for any of our columns? Which?" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "No. In fact some of the distributions seem to be bimodal (two peaks in the distribution). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 21. Plot some scatterplots examining relationships between all variables.\n", "\n", "I will make several plots and make the commentaries after them." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEPCAYAAABFpK+YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3X98XFWd//HXJBOapEkZMEHoWmCl8MEfiBDYFhdLVaAFBNeKyuJXhIrACqxkXRVRF0FdVy12/YWgBSsuLPiV+gstVH5aKoiUqiD4KYUKXVk0U0jb0LRN2tk/zh16E5J2ks7NzOS+n49HH525P885ufO55557z7mZQqGAiIikR12lEyAiImNLgV9EJGUU+EVEUkaBX0QkZRT4RURSRoFfRCRlspVOgIwNM9sfeAJ4ODY5A3zF3a8d4bZOAqa5+7+Z2SnAse7+zztYfgFwo7vfPvKUg5mdDezm7lea2XlAzt3/YzTb2sl+9gH+E3g1UAB6gX939x8PWu5m4BhgX3ffGE07A/iXaJF9o3W7ou8XuvvSIfb3PeALwErgR8BBwB3ufm40/wDganc/NrbOPwCHuPtnypLpgelZCBwXS3fRte7+VTO7G9gPWEc4dnYD/tvdLx9m/TqgBbjK3b9oZlcBve7eGS3fCuSBm9399GhaPfAcMN3dHyt3HiVQ4E+XXnd/ffGLmf0N8IiZPejuvx/Bdo4E9gRw958AP9nRwu5+9mgSG3M08Ei0rat2cVs7sgC43d3fDWBmrwaWmdkbikHIzCYDM4D7gTOAq6J0XQdcFy2zEHjE3ecNtyMzexewzt0fiU6ea9z9RDO71cxe6+6PAF8GPhxfz91/ZGbnm9nr3f23Zc19MH9H6QY+4u4/iPKQAx41szvcfdlQ65vZvsBjZvYTYDFwWWxbxwN3AMebWYO79xGOrbUK+slS4E8xd/+zmT0OHGRmTwDfBA4EXgZsAE53d49qes8BBwM3AecB9Wa2DngcONXd32pmexMC4cHANkJNr1hT/DrwIHAPcCswjVBrvMDdl5rZy4GrgZcDewNPAe8C/h44BTjOzHqBdqDN3S8ws9dE230ZoYZ+hbtfZ2Yzgc8BTwKvBRqAc2PBaTj7AE1mVufu29z90SgoPx9b5hxCsPoB8Bkzu9rdR9ML8jLgndHnzcBEM9sNaAa2mNlbCSeD3w2x7jXApcDb4xPN7AZgubtfEX3/J2Am8H7gO4S/7TZgOaE8to0i3XGt0f/5HSzzCsLfeQOh3G40sz3d/TngZOC/ou3MiOa/BfjZLqZLdkJt/ClmZkcBU4FfAycA3e5+lLsfBPwGuCC2+PPu/mp3v4wQ3G9y908M2uSVwEp3Pxg4CjjHzKYOWmZf4J7oyuNi4CYzawBOA+5z96OAVwIbgfe6+w8JVxTz3f0bsbRno+lfc/fXRen/9yhPEE4sV7j7YYSg9+8lFMm/Rnn+q5n92Mw+Ajzp7s/G9vkBQrD6KeEkNbuE7Q5gZq8FmqJaPcAvgE3Ab4G7CCe9TwKfGmYTS4ATzKxp0PRvA2fGvp8ZTXs70BqV+ZHRvFcOs+1OM/vtoH+HxOZ/KZr2B0LT4e2EpqrB6z9pZnngo8BJ7v5nd+8BfgXMMLM6YBahEvAzwskdFPjHhGr86dJkZsXmgSyhpvYed18DrIl+rBcSTgYzgfti676kjXoIxxJ+6Lj7OkJtGzOLL/O8u98QLbPYzLYCr3P3r5jZG83sXwg109cSTkjDOQhodPdF0baeidreZxMFz1hTyEMMDIhDcvc7o6aJ6YQa6MnAv5nZm939N8DbgHrgVnfvN7MbgYsITRgjcTCwKrbfbcCLzWFm9ilCrb7NzK4lXLF8yt1XRMs/Z2abCO3tf4xt926g0cyOIJw42wm16P0JJ8W7CSeZ/3T3VQxtJE097YQgfTHw+fj6ZjaRcHW4OUpX0WLCsfVX4PEoL7cA/9/MGoFDBy0vCVDgT5cBbfxxUbPAOYSmkxsITTt/G1ukp4Tt9xOaXIrbfCUvbQboH/S9DthqZl8A/g64lhC4GwhNBMOpj+8rtq2G6HNvbHphJ9vCzPYCPk24EXsvcC8hWC4A3ke4Avog0ASsik5muwH7mNlr3P0PO9r+IAWGudqOTjzHAm8iXFl8GVgN/Dfwxtii/cDW+LruXjCzawj3HjYD10TNUKujK6+ZwJuB283sHHf/6QjS/BLu3hU1L81ie+AvznvBzN4LPAZ0RvmAEPi/Rzgx3RIt+0gU9E8Blrn7pl1Jl+ycmnqkaBaw0N2vAZxQ260fZtl+tgfYuNuBswDMbHdCbfPAQcu0m9nsaJmTgT7Ck0azCDXR7xFqg8fF9j/U/v4I9JnZnGhbk4F3EGq0o/FctM8PmVkm2mYzcADwkJkdRHiSp8Pd94/+TQZ+CXxohPvyaLtDuQL4WHQVMIGQ922Etn+idO0ONAJPD7H+QkIAfSehiat4Uv8OsMTdPwbcBhw+wjS/RNREdzzwwFDz3f15ws3py6IHCYhOkLsTrp5uiS3+c+AS1MwzJhT4pWgecK6Z/Z7QrPMQoclnKHcCs8zsa4OmXwC8KtrGMuDz7r580DKbgPea2e+ATwD/4O5bgcuBedG6PyHUuIv7XwycZ2YfL24kegLkHwiB+veEk87l7n7XKPKOu/cTgthRhBryI4Smpluix13/CfjhEE0kl0f5aRvBvh4Bes3sVfHpZnYs0OPu90eTriAE7LsY2N5/fJSuzUNs+1nC3+737v5MNPk6wkn0UTNbTgi8Xx0meUO18cefpCq28a8A/kC4H/G5HeT1esLN5Hjz0e2EZrpHY9N+Rmjm+flw25LyyWhYZhkrFvoSPOLuLZVOS6WZ2enA0e7+wVGseydw0QgfwRV5kdr4JTWip3TeM8zsL0W10zHh7jeY2Slmdoi7P7zzNQIzezuwVEFfdoVq/CIiKaM2fhGRlFHgFxFJGQV+EZGUqfqbu11dG1J3E6KlZQI9PS95Ui9VVAYqg7TnH3atDNrbW4fttKgafxXKZofrN5UeKgOVQdrzD8mVgQK/iEjKKPCLiKSMAr+ISMoo8IuIpIwCv4hIyijwV5l8PsODD4b/RUSSoMBfRRYtytLRMZHZs+vo6JjIokVV381CRGqQAn+VyOczdHY20tubYf36DL294ftY1/zz+QwrVtTpikNkHFPgrxJr1mTIDqrgZ7Nh+lgpXnGcemqzrjhExjEF/ioxZUqB/kFvo+3vD9PHQvyKY8OGyl1xiEjyEqvSRa9mWxd9XU14PdtVhBdUbwZOc/e1ZnYpcBLh3aIXufuQ7+8c79raCsyfv4nOzkYaGqCvD+bP30Rb29gE/h1dcYxVGkRkbCQS+M2sEcDdZ8am3Qlc4u73m9k7gIPMbDPhBdbTgCnAzcCRSaSpFsyZ08+MGS/Q3d1ELtc7pgG30lccIjJ2kmrqORRoNrMlZnanmR0F7AWcbGZ3A9OBB4CjgSXuXnD3p4GsmbUnlKaa0NZW4IgjGPNadvGKo6mpQGtrgaamwphecYjI2EmqqWcjMA9YABwI3AbsD1wIfDKa/j5gErA2tt4GYHegqzihpWVC6kbpq6+vI5drHvP9zp0LJ5+8jaeegv32g/b23Qgtc2OvUmVQTdJeBmnPPyRXBkkF/pXAKncvACvNLA/s7+53AZjZLcBxwB+B1th6rUB3fENpHI87l2umu3tjRfbd0ABTp4bP3d07XjZJlSyDapH2Mkh7/mHXyqC9vXXYeUk19cwFrgAws8mEgL7czN4YzZ8B/AFYBswyszoz2xeoc/d8QmkSERGSq/FfAyw0s3uBAuFE8ALwDTPLEp7y+Zi7bzGzpcB9hJPQ+QmlR0REIplCobpv3qXx1Yu6xFUZgMog7fmHXW7q0asXRUQkUOAXEUkZBX4RkZRR4BcRSRkFfpER0LDVMh4o8IuUSMNWy3ihwC9SAg1bLeOJAr9ICarhRTki5aLAL1ICDVst44kCv0gJNGy1jCe6OyVSouKLctasyTBlSkFBX2qWAr/ICLS1KeBL7VNTj4hIyijwi4ikjAK/iEjKKPCLiKRMYjd3zWwFsC76uhr4KfAlYE007VJgKXAlcCiwGTjb3VcllSYREUko8JtZI4C7z4xN+yzwUXe/OTZtDtDo7keZ2XTCe3rflkSaREQkSKrGfyjQbGZLon1cAnQAh5nZRcADwMeAo4FbAdz9fjM7IqH0iIhIJKnAvxGYBywADgQWA1cDPyA0+1wFnAdMYntzEMBWM8u6+4ud41taJpDN1ieUzOpUX19HLtdc6WRUlMpAZZD2/ENyZZBU4F8JrHL3ArDSzNYC/+3uawDM7MfAOwhBvzW2Xl086AP09GxOKInVSy+ZVhmAyiDt+Yddftn6sPOSeqpnLqG9HjObDOwO/NrMXhHNfwuwHFgGnBgtNx14OKH0iIhIJKka/zXAQjO7FygAZwEtwCIz6wUeBb4NbAWOM7NfAZloORERSVAigd/dtwCnDzFryRDTzksiDSIiMjR14BIRSRkFfhGRlFHgFxFJGQV+EZGUUeAXEUkZBX4RkZRR4BcRSRkF/kHy+QwrVtSRz2cqnRQRkUQo8McsWpSlo2Mip57aTEfHRBYt0rvoRWT8UeCP5PMZOjsb6e3NsGFDht7e8F01fxEZbxT4I2vWZMgOquBns2G6iMh4osAfmTKlQH//wGn9/WG6iMh4osAfaWsrMH/+JpqaCrS2FmhqCt/b2mo38OtGtYgMRXcvY+bM6WfGjBdYsybDlCmFmg76ixZl6exsJJsNVy7z529izpz+na8oIuOeAv8gbW21HfBh4I3qos7ORmbMeKHm8yYiu05NPeNQKTeq1Qwkkl4K/OPQzm5Uq7+CSLol9os3sxWEl6kDrHb3s6LpnwAOcffTou+XAicB/cBF7v5AUmlKi+KN6sFt/G1thZ02A+XzmXFxj0NEhpdI4DezRgB3nzlo+gnACcD/RN8PB44BpgFTgJuBI5NIU9oMd6N6R81Av/xlvW4Ii6RAUjX+Q4FmM1sS7eMSIA+cC3waODta7mhgibsXgKfNLGtm7e7elVC6UmWoG9XDNQNNnFjQDWGRlEgq8G8E5gELgAOBxcCfCC9gf1VsuUnA2tj3DcDuwIuBv6VlAtlsfULJrE719XXkcs2JbDuXg299q8A550BDA/T1he91dU00NEBv7/ZlGxqgu7uJqVMTScoODVUGXV3w1FOw337Q3j72aRprSR4HtSDt+YfkyiCpwL8SWBXV5Fea2VZgf+AmIAdMNrOLgfVAa2y9VqA7vqGens0JJbF65XLNdHdvTGz7s2bB8uUD2/Lz+Qx9fRMHLNfXB7lcL93dY1/jH1wGaeyXkPRxUO3Snn/YtTJob28ddl5ST/XMBa4AMLPJwDbAojb/i4A73f0/gGXALDOrM7N9gTp3zyeUJolpaytw2GHbXmzGqeaeyxpAT6S8kqrxXwMsNLN7gQIw191fUj1z9+VmthS4j3ASOj+h9EgJqrXn8o5uSFdLGkVqSaZQqO4fTlfXhupOYAJ0iTuwDPL5DB0dEwfceG5qKrB8+fi+8Zz24yDt+YddbuoZ9pJYHbik6lVzM5RILVKXTakJ1doMJVKLFPilZoyHAfREqoGaekREUkaBv4ZoRE0RKQcF/hqhETVFpFwU+GuAOjCJSDkp8NeAUl6sIiJSKgX+GrCzF6uIiIyEAn8NUAcmESkn3SGsEerAJCLlosBfQ9SBSUTKQU09IiIjMB760yjwi4iUaLz0p9lp4Dezj49FQpIwHs7MIlIdxlN/mlJq/MclnooEjJczs4hUh/HUn6aUaPgyMzt+qBnuvqTM6SmL+Jm5qLOzkRkzxveLO0QkOeOpP00pgX8v4DRg8GmtAAwb+M1sBbAu+roauBaYF633c3e/PFruUuAkoB+4yN0fGEkGhqJX9dW2fD7DqlWQy+nvJdWj2J+ms7ORbDYE/ST70yT5Oygl8P/R3eeOZKNm1ggQvVy9OG05cKq7rzazu8zsp4STyTHANGAKcDNw5Ej2NZTxdGZOm0WLsnR2NtLQAH19E5k/fxNz5rzkdc0iFTFW/WmS/h2U0sa/dRTbPRRoNrMlZnanmU0HpkVBvwXYHVgLHA0scfeCuz8NZM2sfRT7G0A9XWtTvIlu/fravnkm41dbW4HDDtuWaE0/6d/BTmv87n7s4Glm9lrgAnc/b5jVNhKadRYABwKLw2o2HbgReBToAiYRTgBFGwgnha7ihJaWCWSz9SVlJm7uXDj55G089RTstx+0t+8G7Dbi7VRCfX0duVxzpZMx5latgoYG6O3dPq2hAbq7m5g6tXLpqpS0HgdFac3/WPwOSn7UxczqgTnABcDLCUF9OCuBVe5eAFaa2VpgH3e/H9jfzD4LXEwI+q2x9VqB7viGeno2l5rEl2ho4MWC6u7e8bLVJJdrprt7Y6WTMeZyuQx9fRMHTOvrg1yul+7u9F2tpfU4KEpr/sv1O2hvbx12XinP8e8d3YB14BRggrsf7O7zdrDaXOCKaP3JhFr8981sj2j+BmAbsAyYZWZ1ZrYvUOfu+RLyJFWkXP0l4k10kyapiU7SaSx+B6XU+FcBXwUOc/cNZra4hHWuARaa2b2Ep3jOAtqBxWa2Gfhf4Gx37zGzpcB9hJPQ+aPJhFRO8SZU/CmHXbkJVbx51t3dRC7Xq6AvqZT07yBTKOx4g2b2buD9wB6ERzJPdfe3lDUVO9DVtSF1v/xaucTN5zN0dEwc0F+iqanA8uW73l+iVsogSWkvg7TnH3atDNrbW4e9BN9pU4+73+TuxwPvBCYDrzSzm8zsraNKjYwb46kno0ialDxIm7v/yd0/BRwAfA/4QGKpkpqg/hIitamkwG9mb4v+nwR8AZgOnJ5guqQGqL+ESG3a6c1dM/sP4EAzuwX4OvAC8AzwTeCMZJMn1U5vBhOpPaU81dPh7seZWZYwps4Ud98YPbEjojeDidSYUpp6iq24fwc84u7FW8y10Q1WREQGKKXGvzUalvlMwiBqmNmxDOphKyIitaGUGv+HCM/xrwG+aWazCL1yL0wyYSIikoxSBml7Anh3bNJt0T8AzOxcd786gbSJiEgCyvGy9XfvfBEREakW5Qj86qYpIlJDyhH49RyfiEgNKUfgFxGRGqKmHhGRlCllyIYrgFuBe9x9yxCLfLTsqRIRkcSU0oHrYeA9wFfMbDXhJHCruz8O4O6/STB9IiJSZqU8x78QWAhgZvsDM4FrzWyyux8w3HpmtgJYF31dDfwX8FmgD/grcEY05s+lhDGA+oGL3P2BUeZFRERKUNLL1qP34Z4Y/XsF8AAw7Dt3zawRwN1nxqY5MMPd/2JmnwfOjgZ6OwaYBkwhDAlx5KhyIiJDyuczGj1VBiiljf93hBr6YuBid3+0hO0eCjSb2ZJoH5cAM939L7H9bgKOBpa4ewF42syyZtbu7l2jyIuIDFLudyLL+FDKUz33A3sChwOHmdnLSlhnI+GKYBZwHnA90AVgZm8H3gRcB0xie3MQwAZg91ITLyLDy+czdHY20tubYcOGDL294Xs+rwfx0q6UNv5zAczstcAJwI1mthtwu7t/ZpjVVgKropr8SjNbC+xjZqcCpwKz3X2Tma0HWmPrtTJo1M+Wlglks/UjzVdNq6+vI5drrnQyKkplsOtlsGoVNDRAb+/2aQ0N0N3dxNSpZUhgwnQMJFcGJbXxR/4HeAxoA44itM0PF/jnAocAHzSzyYSa/VxCE9Cx7l48FJcBXzSzeYR7B3Xuno9vqKdn8wiSOD7kcs10d2/c+YLjmMpg18sgl8vQ1zdxwLS+Psjleunurv62fh0Du1YG7e2tw84rpY3/G4S2+G3AHcAvgE/HgvdQrgEWRjdvC8C50XoPAYvNDOAmd/+mmS0F7iM0O51fSoZEZOeK70Qe3MavG7ySKRR2fBCY2XuBX7j7s4OmN+0k+JdFV9eG1B2lqumoDKB8ZVCrT/XoGNjlGv+wN3NKubm7FLjEzC4zs2YAMzuR0LFLRKpcW1uBww7bVlNBX5JVShv/DYQOXPsBl5vZFmAO4VWMIiJSY0oJ/Nvc/VsAZvYn4B7g9e6+KcF0iYhIQkoJ/H2xz2uBM6PHNEVEpAaV0sYfD/LrFPRFRGpbKTX+o83sGcK4+3vGPhfcfXKiqRMRkbIrpefubmOREBERGRuldOCqB94GPEUYiuEKYDfgUnd/KtnkiYhIuZXS1HMl0EIYR2cv4DZgDfAd4M3JJU1ERJJQys3dQ9z9PcDbgT3c/VJ3X0AYwkFERGpMKYF/E4C7bwX+PMJ1ZQzk8xlWrKjTcLsiwxjNb2Q8/65Kaep5mZkdT3iSJ/55z0RTJiXRizZEdmw0v5Hx/rsqZZC27wwzq9Hd/7H8SRpIg7QNL5/P0NExkd7e7TWSpqYCy5e/UPPjsmiALpVBOfI/mt9INf2uKjlIW7O7n+XuZwGPxD6/fFSpkbJZsyZDdtA1WzYbpovI6H4jafhdlRL422OfT0oqITJyU6YU6B909dnfH6aLyOh+I2n4XZUS+DPDfJYKK75oo6mpQGtrgaamgl60IRIzmt9IGn5XpdzcLQzzWarAnDn9zJjxQk2+aENkLIzmNzLef1elBP7XmNkNhNp+/POrd7SSma0A1kVfV7v7WVEv4JuABe5+a7TcpYQmpH7gInd/YHRZSa+2tvF3YIqU02h+I+P5d1VK4H9X7PNVw3wewMwaAdx9ZmzaAcB3gSnAgmja4YSXtk+Lpt8MHFla0kVEZDRKGaTtnlFs91Cg2cyWRPu4BOgFPgB8LLbc0cCSaKjnp80sa2bt7t41in2KiEgJSqnxj8ZGYB6hZn8gsBgwd+83s/hykwgvdynaAOwOvBj4W1omkM3WJ5TM6lRfX0cu11zpZFSUykBlkPb8d3XBQw/VMWVKM+3tO19+JJIK/CuBVVFNfqWZrQX2IQzuFreeMPhbUSvQHV+gp2dzQkmsXmnvuAMqA1AZpDn/xZ7DDQ3Q11c3qp7D7e2tw85LaryduYThmzGzyYSa/f8OsdwyYJaZ1ZnZvkCdu+cTSpOISNXL5zN0djbS25th/foMvb3heznHDEoq8F8D5MzsXsJTPHPd/SWnK3dfDiwF7iPc2D0/ofSIiNSEseg5vNOxeipNY/Wkk8pAZZDW/JdrrKBdHatHRETGSLzn8KRJyfQcTurmroiIjFKx53B3dxO5XG/ZO5Ip8IuIVKG2tgJTp0J3d/lbu9XUIyKSMgr8IiIpo8AvIpIyCvwiIimjwC8ikjIK/CIiKaPALyKSMgr8IiJVKJ/P8OCDlHVwtiIFfhGRKrNoUZaOjonMnl1HR8dEFi0qb19bBX4RkSpSy8Myi4jIKIzFsMwK/CIiVWTKlAL9g95e0t8fppeLAr+ISBXRsMwiIilUs8Mym9kKYF30dTVwNfAVoB9Y4u6XmVkdcCVwKLAZONvdVyWVJhGRWpHksMyJBH4zawRw95mxab8F3gE8CfzMzA4H9gca3f0oM5tOeEH725JIk4iIBEnV+A8Fms1sSbSPTwMT3P0JADO7DXgLsA9wK4C7329mRySUHhERiSQV+DcC84AFwIHAYqA7Nn8D8EpgEtubgwC2mlnW3V+8p93SMoFstj6hZFan+vo6crnmSiejolQGKoO05x+SK4OkAv9KYJW7F4CVZrYO2DM2v5VwImiOPhfVxYM+QE/P5oSSWL1yuWa6uzdWOhkVpTJQGaQ9/7BrZdDe3jrsvKQe55xLaK/HzCYTAvwLZnaAmWWAWcBSYBlwYrTcdODhhNIjIiKRpGr81wALzexeoEA4EWwDrgfqCU/1/NrMfgMcZ2a/AjLAWQmlR0REIokEfnffApw+xKzpg5bbBpyXRBpERGRo6rkrIpIyCvwiIimjwC8ikjIK/CIiKaPALyKSMgr8IiIpo8AvIpIyCvwiIimjwC8ikjIK/CIiKaPALyKSMgr8IiIpo8AvIpIyCvwiIimjwC8ikjIK/CIiKaPALyKSMkm9ehEz2wtYDhxHeOfuVcBm4LfAh9x9m5ldCpwE9AMXufsDSaVHRESCRGr8ZtYAXA30RpO+RQjsbwTWAaeb2eHAMcA04DTgG0mkRUREBkqqqWceoYb/TPT9Fe7+q+jzMuDo6N8Sdy+4+9NA1szaE0qPiIhEyt7UY2ZnAl3ufpuZfTya/KSZHePu9wAnAxOBScDa2KobgN2Brvj2WlomkM3WlzuZVa2+vo5crrnSyagolYHKIO35h+TKIIk2/rlAwcyOBV4PXAd8BPi4mX0U+A2hrX890BpbrxXoHryxnp7NCSSxuuVyzXR3b6x0MipKZaAySHv+YdfKoL29ddh5ZQ/87j6j+NnM7gbOA04E5rr7M2b2NWAx8Bfgi2Y2D3gFUOfu+XKnR0REBkrsqZ5BHgd+bmYbgbvc/ecAZrYUuI9wr+H8MUqLiEiqZQqFQqXTsENdXRuqO4EJ0CWuygBUBmnPP+xyU09muHnqwCUikjIK/CIiKaPALyKSMgr8IiIpo8AvIpIyCvwiIimjwC8ikjIK/CIiKaPALyKSMgr8IiIpo8AvIpIyCvwiIlUon8/w4IPh/3JT4BcRqTKLFmXp6JjI7Nl1dHRMZNGi8g6krMAvIlJF8vkMnZ2N9PZmWL8+Q29v+F7Omr8Cv4hIFVmzJkN2UAU/mw3Ty0WBX0SkikyZUqC/f+C0/v4wvVwSewOXme0FLAeOAxqBq4B+YCVwtrtvM7MPAOdG0z/r7rcklR4RkVrQ1lZg/vxNdHY20tAAfX0wf/4m2trKF/gTeQOXmTUA3wdeA5wCfB74trv/3MyuB24kvHT9F8ARhBPDvcAR7j7g7ep6A1c6qQxUBmnPfz6fobu7iVyud1RBvxJv4JpHqOE/E31fAexpZhmgFegD/g5Y5u6b3X0dsAp4XULpERGpKW1tBY44grLW9IvK3tRjZmcCXe5+m5l9PJr8OPAN4JPAOuBu4NToc9EGYPfB22tpmUA2W1/uZFa1+vo6crnmSiejolQGKoO05x+SK4Mk2vjnAgUzOxZ4PXBd9P9h7v4HMzsfuAK4jVD7L2oFugdvrKdn8+BJ417aL3FBZQAqg7TnH3b5ZevDzit74Hf3GcXPZnY3cB7wI2Ao+bkNAAAHhUlEQVR9NPkZ4O+BB4DPmVkjMAF4FfBIudMjIiIDJfZUzyBnAzeaWT+wBfiAuz9rZl8FlhLuNXzC3TeNUXpERFIrkad6yklP9aSTykBlkPb8wy439Qz7VE/VB34RESkv9dwVEUkZBX4RkZRR4BcRSZmxeqpHdiDq6HYKsBtwJXAPsBAoEB5xPd/dt1UsgQmLOv2dGX1tJPT7mAl8hTCO0xJ3v6wSaRsL0RAn3wX2B7YCHyDkeyHpOQYmAN8BXkl49Pt84GWk5xiYBnzB3Wea2VSG+Nub2aXASYTyuMjdHxjt/lTjrzAzmwm8gdC34RhgCvBl4JPu/kYgA7ytYgkcA+6+0N1nuvtMwsB+/0wY8uN04GhgmpkdXsEkJu1EIOvubwAuBz5Hyo4Bwsmux92nAxcCXyclx4CZfRRYQKj0wBB/+yjvxwDTgNMIIyGMmgJ/5c0CHgZ+CPwUuAXoINT6ARYDx1YmaWPLzI4gDOx3IzDB3Z9w9wKhl/dbKpq4ZK0EsmZWB0wijGWVtmPg1YR84u4OHEl6joEngDmx70P97Y8mXPUU3P1pwvHSPtodKvBXXhthhNJ3Eno5Xw/URQc7DDOG0Th1CXAZIfitj00f72XQQ2jm+SPwbeCrQCZlx8BvgbeaWcbMphPy2xObP27LwN1vJpzsi4b620+ihLHNSqXAX3lrgdvcfUtU09nEwD/okGMYjTdmlgMOdve7CEF/p+M4jSOdhGPgIOBQQnv/brH54z3/ANcS/u53AScDvwMmxuanoQyK4vdyivku629Cgb/y7gVmRzWdyYSD/Y6o7R/gBMKwFuPdDOB2AHdfD2wxswOiobxnMb7L4Hm21+aeAxqAFSk7Bo4E7o3u8/yQ0PyVpmMgbqi//TJglpnVmdm+hFaB/Gh3oKd6KszdbzGzGYRB6+oITzOsBr5tZrsBjwE/qGASx4oBT8a+F5u96gltm7+uSKrGxnzgWjNbSqjpXwI8SLqOgceBz5jZvxJqsu8H9iU9x0Dchxn0t3f3rdHxcR/b48SoacgGEZGUUVOPiEjKKPCLiKSMAr+ISMoo8IuIpIwCv4hIyuhxTqmI6Dnl7wOPEgajagKud/evlbj+IcAe7v5LM7sROMPdtwyx3MXAnSMd0MrMLnD3r5vZbGBfd//WSNbfwXbfB7yPMBhbBviiuy+Jzf8JoefmydH344BPRLPfAPwq+vxhd18+aNsLgQsI3fsvB54G3hUN8PV1YJ67/yla9nLgRnd/tBz5ktqixzmlIqLAf567nxZ9nwA48Hp332mPRDP7NPCsu1+VUPqedfe9y7zN3QmD0L3a3bdEHfYeIJxYtpnZFLb32j3D3Z8ctP6waTKzdwN7ufvXopPHGYThLxYSTjKnufslseVzhBPtSeXMo9QG1filWrQSAlS/mR0DXBpNbyYEsS2EQezWErr1n0no2fkQ4crhYMLIpgsIgXMjYRTDLxEGfdubMMLlJML4SJe7+81mdiqhM0zx/aSnAucCe5rZlYTAfLC7X2xmH4622Q/80t0/Fp2A/hbYC9gP6HT324bJYw/hN/dPZnaLuz9hZgfEhlt+P/BjoBf4IPCvIyi/C4G3x/YzMfr3AqEsPxhf2N27zWyTmb3O3X8/gv3IOKA2fqmkN5vZ3WZ2J6GH5oXu3kMYofP/ufubgZ8QBrCDELyPj8ZlXwh8eVATzjzg8+5+FHA1cNig/bUAxwHHA182syxwEHBSNFSAA7Pc/XPAc+7+YrCMmpbeRWhueQNwoJm9NZq92d1PAD5EGHdnSO6+lTDS4oHArWb2FDA32n4dYQji7xFOVO82s6adFWC0bhPhqqErmvQZwkBvq4GphOahfzSzq8zsqNiqvye890BSRjV+qaQ7i009g/wZ+KqZ9QB/QxinBGD1UO34MUbo0o67fx/AzE6Pzb8nql3/xcyeB9qBvwLfjfZ1cHH9IRwM3O/ufdF2lxJOUAArov/XsH1M9ZcmLjTtNLn7BdH3gwgngHuBVxCuem6IFi+eCK7ZQX6L9gBeHLfF3R8D3mFm9YSrobMJg6C9k3AiPTFa9H8J5Sspoxq/VKMFwFnufibwDNubYeKjFm7jpcfvY4TBvjCz95jZhYPmd0TzXk5o8tlIaAc/jRAce2P7ygxa94+El4Fko0HDZhAGEoNwc7oUewPXm9ke0fenCAF7S7T/s919trvPJlxdlDoey1oGjtxYdA7hyghCWRUYOOLlHoQTn6SMAr9Uo+8BvzazZYSANnmIZZYDF5jZm2LTPgJ83MzuBt5DaD6K29vM7gB+RmjzXk+4mniIMAJib2xfj5rZfxVXdPeHCbXnZYR2/z8BPxpJptz9IUITzJ1R3n5JOMk9T3iz0m2xZZcBjWb2hhK2uxl41sz2Kk4zs0nATHf/qbs/DzwbpT1+BTENuGMkeZDxQU/1SCpE7/U92N0vrnRakmBm/wjs7e7zS1x+T+C7xcdGJV3Uxi9SZmb2b8Cbh5h1lruvTmi3NwLXmVlLdIN8ZzoJwz9LCqnGLyKSMmrjFxFJGQV+EZGUUeAXEUkZBX4RkZRR4BcRSRkFfhGRlPk/3daT92Q0gpsAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_total_new.plot(kind='scatter', x='Participation_SAT (%)', y='EBRW_SAT', c='blue', title='Participation_SAT (%) vs EBRW')" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_total_new.plot(kind='scatter', x='Participation_ACT (%)', y='Verbal_ACT', c='red', title='Participation_SAT (%) vs Verbal')" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_total_new.plot(kind='scatter', x='Participation_SAT (%)', y='Math_SAT', c='red', title='Participation_SAT (%) vs Math')" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_total_new.plot(kind='scatter', x='Participation_ACT (%)', y='Math_ACT', c='red', title='Participation_ACT (%) vs Math')" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,0,'Rate')" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,1,'SAT correlation between Math and EBRW scores')" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "(50, 105)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "(450, 600)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_aux = df_total_new.copy() # To backup\n", "plt.scatter(df_aux['Participation_SAT (%)'],df_aux['Math_SAT'], color = 'r', label='Math')\n", "plt.scatter(df_aux['Participation_SAT (%)'],df_aux['EBRW_SAT'], color = 'b', label='EBRW')\n", "plt.xlabel('Rate')\n", "plt.title('SAT correlation between Math and EBRW scores')\n", "plt.grid(True)\n", "plt.xlim([50,105])\n", "plt.ylim([450,600])\n", "plt.legend(loc='upper right')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Same plot as above with different $y$-range show the negative correlation more clearly:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,0,'Rate')" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,1,'SAT correlation between Math and EBRW scores')" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "(50, 105)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "(400, 650)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.scatter(df_aux['Participation_SAT (%)'],df_aux['Math_SAT'], color = 'r', label='Math')\n", "plt.scatter(df_aux['Participation_SAT (%)'],df_aux['EBRW_SAT'], color = 'b', label='EBRW')\n", "plt.xlabel('Rate')\n", "plt.title('SAT correlation between Math and EBRW scores')\n", "plt.grid(True)\n", "plt.xlim([50,105])\n", "plt.ylim([400,650])\n", "plt.legend(loc='upper right')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,0,'Rate')" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,1,'ACT correlation between Math and Verbal')" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.scatter(df_aux['Participation_ACT (%)'],df_aux['Math_ACT'], color = 'r', label='Math')\n", "plt.scatter(df_aux['Participation_ACT (%)'],df_aux['Verbal_ACT'], color = 'b', label='Verbal')\n", "plt.xlabel('Rate')\n", "plt.title('ACT correlation between Math and Verbal')\n", "plt.grid(True)\n", "plt.legend(loc='upper right')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 22. Are there any interesting relationships to note?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Participation rate and scores seem to be \"universally\" negatively correlated. A larger fraction of students taking the tests is correlated with lower average grades for all tests in both SAT and ACT. This may have several explanations. One of them could be lack of incentive from public schools, in some states, for students to take the exams. Tesk takes would then come mostly from private schools which tend to score better. This issue must be investigated in much more detail and there are a multitude of articles online discussing the topics.\n", "\n", "- Good or bad performance in Math and Verbal tests seem to occur together i.e. student on average do well in both or in neither. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 23. Create box plots for each variable. " ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'boxes': [],\n", " 'caps': [,\n", " ],\n", " 'fliers': [],\n", " 'means': [],\n", " 'medians': [],\n", " 'whiskers': [,\n", " ]}" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,1,'Box plot for Participation_SAT (%)')" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "{'boxes': [],\n", " 'caps': [,\n", " ],\n", " 'fliers': [],\n", " 'means': [],\n", " 'medians': [],\n", " 'whiskers': [,\n", " ]}" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,1,'Box plot for EBRW_SAT')" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "{'boxes': [],\n", " 'caps': [,\n", " ],\n", " 'fliers': [],\n", " 'means': [],\n", " 'medians': [],\n", " 'whiskers': [,\n", " ]}" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,1,'Box plot for Math_SAT')" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = plt.figure(figsize=(12,4))\n", "\n", "ax1 = fig.add_subplot(131)\n", "plt.boxplot(df_aux['Participation_SAT (%)'])\n", "ax1.set_title('Box plot for Participation_SAT (%)')\n", "\n", "ax2 = fig.add_subplot(132)\n", "plt.boxplot(df_aux['EBRW_SAT'])\n", "ax2.set_title('Box plot for EBRW_SAT')\n", "\n", "ax3 = fig.add_subplot(133)\n", "plt.boxplot(df_aux['Math_SAT'])\n", "ax3.set_title('Box plot for Math_SAT')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'boxes': [],\n", " 'caps': [,\n", " ],\n", " 'fliers': [],\n", " 'means': [],\n", " 'medians': [],\n", " 'whiskers': [,\n", " ]}" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,1,'Box plot for Participation_ACT (%)')" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "{'boxes': [],\n", " 'caps': [,\n", " ],\n", " 'fliers': [],\n", " 'means': [],\n", " 'medians': [],\n", " 'whiskers': [,\n", " ]}" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,1,'Box plot for Verbal_ACT')" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "{'boxes': [],\n", " 'caps': [,\n", " ],\n", " 'fliers': [],\n", " 'means': [],\n", " 'medians': [],\n", " 'whiskers': [,\n", " ]}" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "Text(0.5,1,'Box plot for Math_ACT')" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df_aux = df_total_new.copy()\n", "df_aux['Verbal_ACT'] = (df_aux['English_ACT'] + df_aux['Reading_ACT'])/2\n", "\n", "fig = plt.figure(figsize=(12,4))\n", "\n", "ax1 = fig.add_subplot(131)\n", "plt.boxplot(df_aux['Participation_ACT (%)'])\n", "ax1.set_title('Box plot for Participation_ACT (%)')\n", "\n", "ax2 = fig.add_subplot(132)\n", "plt.boxplot(df_aux['Verbal_ACT'])\n", "ax2.set_title('Box plot for Verbal_ACT')\n", "\n", "ax3 = fig.add_subplot(133)\n", "plt.boxplot(df_aux['Math_ACT'])\n", "ax3.set_title('Box plot for Math_ACT')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### BONUS: Using Tableau, create a heat map for each variable using a map of the US. " ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "##### To be done." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Descriptive and Inferential Statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 24. Summarize each distribution. As data scientists, be sure to back up these summaries with statistics. (Hint: What are the three things we care about when describing distributions?)\n", "\n", "We look into the mean, standard deviation and degree of skewness. The median and the $50\\%$ percentil values are the same by definition. " ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_SAT (%)EBRW_SATMath_SATParticipation_ACT (%)English_ACTMath_ACTReading_ACTScience_ACT
mean76.142857529.666667517.90476234.66666722.83333322.83333323.70476222.800000
std16.95076919.65282021.58681318.5211592.1441391.6698301.7693151.553061
50%70.000000531.000000523.00000031.00000023.50000023.30000024.40000023.300000
\n", "
" ], "text/plain": [ " Participation_SAT (%) EBRW_SAT Math_SAT Participation_ACT (%) \\\n", "mean 76.142857 529.666667 517.904762 34.666667 \n", "std 16.950769 19.652820 21.586813 18.521159 \n", "50% 70.000000 531.000000 523.000000 31.000000 \n", "\n", " English_ACT Math_ACT Reading_ACT Science_ACT \n", "mean 22.833333 22.833333 23.704762 22.800000 \n", "std 2.144139 1.669830 1.769315 1.553061 \n", "50% 23.500000 23.300000 24.400000 23.300000 " ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_aux[cols].describe().loc[['mean','std','50%']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, magnitude of percentual differences between means and medians (values are in percentage so they are indeed low for the scores!):" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Participation_SAT (%) : 8.775510204081627\n", "EBRW_SAT : 0.25109855618330457\n", "Math_SAT : 0.9742329053992527\n", "Participation_ACT (%) : 11.827956989247301\n", "English_ACT : 2.8368794326241176\n", "Math_ACT : 2.002861230329045\n", "Reading_ACT : 2.849336455893814\n", "Science_ACT : 2.145922746781126\n", "Verbal_ACT : 2.8432249726612957\n" ] } ], "source": [ "for col in df_aux.columns.tolist():\n", " print (col,\":\",np.abs(100*(df_aux[col].describe().loc['mean']/df_aux[col].describe().loc['50%']-1)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 25. Summarize each relationship. Be sure to back up these summaries with statistics.\n", "\n", "**25.1) Let us look at correlations:**" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "cols_SAT = ['Participation_SAT (%)', 'EBRW_SAT', 'Math_SAT']\n", "cols_ACT = [ 'Participation_ACT (%)', 'Math_ACT', 'Science_ACT', 'Verbal_ACT']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**SAT**\n", "\n", "- For the SAT we see that the Math and EBRW scores are highly positively correlated\n", "- The participation rate and both scores have also a relatively strong negative correlation as discussed before." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_SAT (%)EBRW_SATMath_SAT
Participation_SAT (%)1.000000-0.656950-0.718164
EBRW_SAT-0.6569501.0000000.961519
Math_SAT-0.7181640.9615191.000000
\n", "
" ], "text/plain": [ " Participation_SAT (%) EBRW_SAT Math_SAT\n", "Participation_SAT (%) 1.000000 -0.656950 -0.718164\n", "EBRW_SAT -0.656950 1.000000 0.961519\n", "Math_SAT -0.718164 0.961519 1.000000" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corr_SAT = df_aux[cols_SAT].corr()\n", "corr_SAT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**ACT**\n", "\n", "- For the ACT we see that the Math, EBRW and Science scores are highly positively correlated as well\n", "- As in the SAT the participation rate and both scores have also a strong negative correlation but in the ACT it us slightly stronger" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_ACT (%)Math_ACTScience_ACTVerbal_ACT
Participation_ACT (%)1.000000-0.839660-0.854699-0.835211
Math_ACT-0.8396601.0000000.9879120.974774
Science_ACT-0.8546990.9879121.0000000.980839
Verbal_ACT-0.8352110.9747740.9808391.000000
\n", "
" ], "text/plain": [ " Participation_ACT (%) Math_ACT Science_ACT \\\n", "Participation_ACT (%) 1.000000 -0.839660 -0.854699 \n", "Math_ACT -0.839660 1.000000 0.987912 \n", "Science_ACT -0.854699 0.987912 1.000000 \n", "Verbal_ACT -0.835211 0.974774 0.980839 \n", "\n", " Verbal_ACT \n", "Participation_ACT (%) -0.835211 \n", "Math_ACT 0.974774 \n", "Science_ACT 0.980839 \n", "Verbal_ACT 1.000000 " ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "corr_ACT = df_aux[cols_ACT].corr()\n", "corr_ACT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**25.2) Skewness general observations**\n", "\n", "The skewness is an important metric and is related to the mean and median. It can be measured using Pearson's coefficient, namely:\n", "\n", "$${g_1} = \\frac{{{\\mu _3}}}{{{\\sigma ^3}}}$$\n", "\n", "\n", "where the numerator is the third momentum. One way to relate the difference between mean and median to the skewness is to use (which is valid *for some particular distributions*):\n", "\n", "$$ g_1 = 2(\\mu-{\\rm{mode}})/\\sigma$$ \n", "\n", "or\n", "\n", "$$ \\mu-{\\rm{mode}} = (1/2) g_1 \\sigma$$\n", "\n", "Though this equality is not valid in general, *an estimation of skewness as an indirect measure of the difference between mode and median and vice-versa may be used heuristically.*\n", "\n", "**25.3) In our case**\n", "\n", "In our case, consider the Math scores for example. The median is larger so $g_1<0$. If the left tail is more pronounced than the right one the function has negative skewness. As we see below (where I reproduced the plots), the left tail is indeed more pronounced." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "g1 from SAT Math is: -0.4837273105976173\n" ] } ], "source": [ "g1_SAT = 2*(np.mean(df_aux['Math_SAT'])-np.median(df_aux['Math_SAT']))/ np.std(df_aux['Math_SAT'])\n", "print (\"g1 from SAT Math is:\", g1_SAT)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "g1 from ACT is: -0.5727420648104604\n" ] } ], "source": [ "g1_ACT = 2*(np.mean(df_aux['Math_ACT'])-np.median(df_aux['Math_ACT']))/ np.std(df_aux['Math_ACT'])\n", "print (\"g1 from ACT is:\", g1_ACT)" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([],\n", " dtype=object)" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "array([],\n", " dtype=object)" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWYAAACVCAYAAABrV1AVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAADUNJREFUeJzt3X2QXXV9x/F3soFgupG1NOFhgKYK/XaIU2IpM6A1hEooqa1SSgdGeVAEpS00KLQgTSoIRabCANWBZGoxDRCjQmlAR6C1IpQOSIHMiJAvjxWmIxKRVNKEjAnbP85ZuVn24WZ37z2/Je/XTCb3nnPuPd/93e/93PO0e6f09/cjSSrH1KYLkCRtz2CWpMIYzJJUGINZkgpjMEtSYQxmSSqMwSxJhZnWdAFNiYg5wLPAPZl5xKB5K4BTgVmZ+ZNhHn8o8LHMPDMiFgBfzMx37mANewNXAwcB/cBm4LLMXDNouVuAI4D9M3NTPe0U4FP1IvvXj11f3z87M+/dkVr05ldCz7c81xt6umXeHwDnAX3ALsCjwHmZ+XxE3AwcUC96cD1vG/ByZh45llpKtNMGc+1VICLiVzPzh1R3fgl4TxuPnQvsO871fwn4t8w8oV73QcB9EfHuzHy8nrYPMB+4HzgFWAaQmSuBlfUyK4BHM/OKcdajN7+me37Ynq7nfQhYAnwgM5+KiCnABcB3ImJuZh7fsmw/cORwHyST2c4ezNuArwIfBi6rpx0HrAHOBaZGxDXAYcBMYApwOvAc8Flg94j4MvBPQG9ErAZ+A9gNOKONrda9gbdExNTMfC0zH4uIDwAvtyzzceDbwM3AJRGxPDP9dU2NVdM9DyP39N8CH8/MpwAysz8iLq/XPx3YMp4ffrLwGHO11Xlyy/1TgRX17QD2AQ7PzIOomvGCzHwe+Bvg3sz8aL3svsBVmTkPWA5c1Ma6zwPOAl6MiDUR8ZfAM5n5AkBETAPOAG4Ebgf2BI4Z488pDWis50fq6YjYA5gD3Nf6mMzsz8ybMvNnO/qDTlY7fTBn5kPAtog4JCL2A2Zm5qP17Mepdqs+ERFXAMcDvcM81dOZ+UB9ey0wu411/zvV8eFjgQeAPwTW1cfyAD4I9AB3ZOYWYDVwzo7+jFKrJnuekXv6tfr/nT6XdvoBqN0AnES1FXFDy/T3A9+sb6+hOhY2ZZjn+HnL7f4RlgMgImZHxLVAf2b+R2ZelpnzqXYzT60X+zPgLcBTEfHfVAF+dETMbfPnkobT9Z6vDdvTmfky8ATVYZTtRMTXIuLgNp7/TcFgrtwI/AlwArCqZfqhwO2ZeR3wX1RN1FPP20p1xnisfgosBBbXJziIiBnAO4CHI+LXqc5aH5KZc+p/+wD3AIvHsV4JGuj5Nnv6YuCaiDigfkxPRCwB5gHrxrruycZgBjLzf6h24Z7MzJ+2zFoNLIiI7wMPA08DvxYRU6nOKL89Iv55jOvcChwNHA48GxGPUh3O+EZmXg/8KXDrwEmQFp8FTo6IXxnLeiVopudpo6czcxXVScmvRMRa4AdUl5P+bn3oY6cwxb/HLEll2dkvl+uo+iqLDw8z+/OZeVM365E6zZ6fGG4xS1JhPMYsSYUxmCWpMOM+xrx+/SsTciykt3c6GzeWe9LV+sZntPpmzZrZzjWwxZiovh9O6a9nK2sdm5F6vpgt5mnTekZfqEHWNz6l11eayTRe1jrxiglmSVLFYJakwhjMklQYf8GkUIdeec+YHvfgufMnuBKpe8ba9zuq9PeJW8ySVBiDWZIKYzBLUmEMZkkqjMEsSYUxmCWpMAazJBXGYJakwhjMklQYg1mSCmMwS1JhDGZJKozBLEmFMZglqTAGsyQVxmCWpMIYzJJUmLa+wSQiZgMPAQszc11nS5LKYN+rKaNuMUfELsByYHPny5HKYN+rSe1sMV8BLAM+PdTM3t7pTJvWM+5Cenqm0tc3Y9zPsyMOXHrHDj/myUuO6UAlE6fbY9iuJl7fcepK3w9nMo3XZKp1QOn1jhjMEfERYH1m3hkRQzboxo1bJqSQvr4ZbNiwaUKeq5NKr7HU+kZ7fWfNmtnFakbWzb4fzmR5P8DkqnVACfWO1POjHco4DVgYEXcD84CVEbHXxJUmFcm+V6NG3GLOzF98x3fdpGdm5gudLkpqkn2vpnm5nCQVpq3L5QAyc0EH65CKZN+rCW4xS1JhDGZJKozBLEmFMZglqTAGsyQVxmCWpMIYzJJUGINZkgpjMEtSYQxmSSqMwSxJhTGYJakwBrMkFcZglqTCGMySVJi2/x6zJodDr7xnhx/z4LnzR19IUte4xSxJhTGYJakwBrMkFcZglqTCGMySVBiDWZIKYzBLUmEMZkkqjMEsSYUxmCWpMAazJBXGYJakwhjMklQYg1mSCmMwS1JhDGZJKozBLEmFGfEbTCJiF+B6YA4wHbg0M2/rQl1SY+x7NW20LeaTgJcy873AIuCLnS9Japx9r0aN9p1/Xwdubrm/tYO1SKWw79WoEYM5MzcCRMRMqkZdMniZ3t7pTJvWM+5Cenqm0tc3Y9zP02ljqfHApXd0oJKJM5YvcB2LJy85pivrGa/x9P1YXuuhxqWd98NErWs0pffvWHTzfTyWMR/1W7IjYj/gVuDazFw1eP7GjVt2eKVD6eubwYYNmybkuTppMtRYqpHGbtasmV2sZHTd6nsYelw69X6wfyvdHIfh1jVSz4928m9P4C7grMz89niKkyYL+15NG22L+ULgbcDSiFhaT1uUmZs7W5bUKPtejRrtGPNiYHGXapGKYN+raf6CiSQVxmCWpMIYzJJUGINZkgpjMEtSYQxmSSqMwSxJhTGYJakwBrMkFcZglqTCGMySVBiDWZIKYzBLUmEMZkkqjMEsSYUxmCWpMKN+5994jOVLPh88d35X1jNW3VyXpM4o/X3sFrMkFcZglqTCGMySVBiDWZIKYzBLUmEMZkkqjMEsSYUxmCWpMAazJBXGYJakwhjMklQYg1mSCmMwS1JhDGZJKozBLEmFMZglqTCj/qH8iJgKXAscDGwBTs/MpzpdmNQUe15Na2eL+Vhgt8w8HLgAuLKzJUmNs+fVqHaC+XeAOwAy837gtztakdQ8e16NmtLf3z/iAhHxJeCWzPxWff854O2ZubUL9UldZ8+rae1sMf8MmNn6GBtUb3L2vBrVTjDfB/w+QEQcBny/oxVJzbPn1ahRr8oAbgUWRsR/AlOAj3a2JKlx9rwaNeox5okWEbOBh4CFwEXAXvWsOcD9mXliRNwG7AH8HNicmYu6VNsjwP/Wd58FlgPXAFuBuzLz4iYvpRqivhuBS6nG6UXglMzcVND43Q58Hni+nvYZ4F68FO0XImIX4Hqq/p9O9Xo+BqwA+oFHgT/PzNcaKnE7w9T7HPAFYBvVa3pKZv64qRoHDFVrZt5Wz/sQcHZ95U1x2tlinjD1QC0HNgNk5on19LcB3wE+WS96ADA3M7v2qRERu9U1LWiZthb4Y+AZ4JsR8VtUL/JumXl4vZt7JfDBhupLYH5m/jgiPgecDvw95YzfpcBfZeYtLdOOo4HxK9hJwEuZeXJE7AE8AqwFlmTm3RGxjGp8bm2yyBZD1fssVcitjYhPAOcDn2qyyNpQtd4WEfOAj1HtDRWpq8EMXAEsAz49aPrFwBcy80cRsSfQB9weEX3A5Zn5jS7UdjAwIyLuohqXi4Dpmfk0QETcCbwP2JuWS6kioluXUg2u70JgQcuWyTTg1YLG70LgEOBdEXEO8D2qN+x2l6J1cfxK9XXg5pb7W6nG7bv1/W8BR1NOMA9V74mZ+aP6/jTg1a5XNbQ31FoH9OXAOcA/NFJVG7r2K9kR8RFgfWbeOWj6bKrAW1FP2pVqK+pY4DjgqnqZTttE9cHxe8CZwJfraQNeAXYH3srru+sA2yKiGx9wg+u7CVgPEBF/BBwJrKSc8buJai/obGA+0FtPb2r8ipSZGzPzlYiYSRUiS4ApLXs7A31XhKHqHQjliHg3cBZwVZM1Dhii1qXAP1Ltmb/SaHGj6ObfyjiN6oTK3cA8YGVE7AUcD6zKzG31ci8AyzJza2a+SLX7EV2o7wngxszsz8wnqMLjl1vmzwQ20NylVIPrewnYOyI+CZwHHJOZr1LO+L0EfCUzn6lDZg3wLrwU7Q0iYj+qD7EbMnMV0Ho8eaDvijFEvUTECVR7w+/PzPVN1teqtVbgSeBA4DpgNXBQRFzdYHnD6lowZ+b8zDyiPga5luoEwQvAUVS7awOOAr4GEBG9wDuBx7tQ4mnUv3obEfsAM4D/i4h3RMQUqi3Be2nuUqrB9b21nvZe4KjM/Em9XCnjtzvwQETsW89/H9VJXy9Fa1EferoLOD8zr68nPxIRC+rbi6j6rghD1RsRJ1FtKS/IzGearK/V4Foz83uZObfOoBOBxzLznEaLHEbXr8oAqLeaz8zMdRHxA+A9mbmhZf7VwGFUWw5/l5n/0oWadqU6nLI/1dnw8+v1Xw30UF2V8dctV2X8JvWlVJm5roH6lgD/CjzM68f0vpqZ1xU0fr1UZ+03U11p8BdUZ+67Pn6liohrgBOA1jFYTHUSd1eqD9UzWvYoGzVEvT1UH/4/5PUt++9m5mcaKG87w4ztoszcHBFzgNWZeVgjxY2ikWCWJA3Pv8csSYUxmCWpMAazJBXGYJakwhjMklQYg1mSCmMwS1JhDGZJKsz/A23XRPNh43AHAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "p1 = df_total_new[['Math_SAT']]\n", "p2 = df_total_new[['Math_ACT']]\n", "\n", "fig, axes = plt.subplots(1, 2,figsize=(6,2))\n", "\n", "p1.hist('Math_SAT', bins=10, ax=axes[0])\n", "p2.hist('Math_ACT', bins=10, ax=axes[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 26. Execute a hypothesis test comparing the SAT and ACT participation rates. Use $\\alpha = 0.05$. Be sure to interpret your results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The full population of high-school graduates is divided into two groups namely, SAT test takers and ACT test takers. \n", "\n", "Suppose I want to test whether the true population mean of the participation rates are different between the SAT and ACT. Our hypotheses have the form: \n", "\n", "\n", "$${H_0}:\\,{\\mu _{{\\rm{PR,SAT}}}} = \\,{\\mu _{{\\rm{PR,ACT}}}}\\\\\n", "{H_{\\rm{A}}}:\\,{\\mu _{{\\rm{PR,SAT}}}} \\ne {\\mu _{{\\rm{PR,ACT}}}}$$\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To obtain the $p$-value we use:" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "('p-value is:', 3.0499821970067098e-09)" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"p-value is:\",stats.ttest_ind(df_aux['Participation_SAT (%)'],df_aux['Participation_ACT (%)'])[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is much smaller than $\\alpha$ so we reject the null hypothesis and conclude that both means differ at this level of confidence." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 27. Generate and interpret 95% confidence intervals for SAT and ACT participation rates.\n", "\n", "From below I conclude that I am 95$\\%$ confident that the true mean rate for the SAT participation is within (26.261598995475964, 43.071734337857364) and that the true mean rate for the ACT participation is within (68.450446510874016, 83.835267774840261)." ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "#print df_aux['Participation_ACT (%)'].mean()\n", "#print df_aux['Participation_SAT (%)'].mean()" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Participation_SAT (%)EBRW_SATMath_SATParticipation_ACT (%)English_ACTMath_ACTReading_ACTScience_ACTVerbal_ACT
District of Columbia100.0482.0468.032.024.423.524.923.524.65
Michigan100.0509.0495.029.024.123.724.523.824.30
Connecticut100.0530.0512.031.025.524.625.624.625.55
Delaware100.0503.0492.018.024.123.424.823.624.45
New Hampshire96.0532.0520.018.025.425.126.024.925.70
\n", "
" ], "text/plain": [ " Participation_SAT (%) EBRW_SAT Math_SAT \\\n", "District of Columbia 100.0 482.0 468.0 \n", "Michigan 100.0 509.0 495.0 \n", "Connecticut 100.0 530.0 512.0 \n", "Delaware 100.0 503.0 492.0 \n", "New Hampshire 96.0 532.0 520.0 \n", "\n", " Participation_ACT (%) English_ACT Math_ACT \\\n", "District of Columbia 32.0 24.4 23.5 \n", "Michigan 29.0 24.1 23.7 \n", "Connecticut 31.0 25.5 24.6 \n", "Delaware 18.0 24.1 23.4 \n", "New Hampshire 18.0 25.4 25.1 \n", "\n", " Reading_ACT Science_ACT Verbal_ACT \n", "District of Columbia 24.9 23.5 24.65 \n", "Michigan 24.5 23.8 24.30 \n", "Connecticut 25.6 24.6 25.55 \n", "Delaware 24.8 23.6 24.45 \n", "New Hampshire 26.0 24.9 25.70 " ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_aux.head()" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(26.261598995475964, 43.071734337857364)" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats.t.interval(0.95,df_aux['Participation_ACT (%)'].shape[0],loc = df_aux['Participation_ACT (%)'].mean(),\n", " scale = (np.std(df_aux['Participation_ACT (%)'], ddof = 1)) / df_aux['Participation_ACT (%)'].shape[0] ** 0.5)" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(68.45044651087402, 83.83526777484026)" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats.t.interval(0.95,df_aux['Participation_SAT (%)'].shape[0],loc = df_aux['Participation_SAT (%)'].mean(),\n", " scale = (np.std(df_aux['Participation_SAT (%)'], ddof = 1)) / df_aux['Participation_SAT (%)'].shape[0] ** 0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 28. Given your answer to 26, was your answer to 27 surprising? Why?" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Not surprising since confidence that they differ must be associated with lack of overlap between confidence intervals.\n", "If we are confident that they are likely to belong to different intervals, it is intuitive that they should differ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 29. Is it appropriate to generate correlation between SAT and ACT math scores? Why?" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Probably yes, since they have the same units and compare reasonably similar quantities (details about the test takers and some other factors for example may change that rationale)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Commentary about bin size\n", "- To choose the size of the bins one can use the Freedman–Diaconis rule which aims to minimize the difference between the area under the empirical and theoretical distributions ${\\rm{bin\\,\\, size}} = 2\\frac{{{\\rm{IQR}}(x)}}{{\\sqrt[3]{n}}}$ where $n$ is the number of observations in the sample and IQR is the difference between 75th and 25th percentiles. The package `astropy.visualization` contains this options and will be used in future projects." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [conda env:anaconda3]", "language": "python", "name": "conda-env-anaconda3-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 1 }