{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exploratory Data Analysis and Visualization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The goal of __exploratory data analysis (EDA)__ is to explore attributes across multiple entities to decide what statistical or machine learning techniques to apply to the data. Visualizations are used to assist in understanding the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: pandas in /opt/conda/lib/python3.8/site-packages (1.2.2)\n",
      "Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.8/site-packages (from pandas) (2.8.1)\n",
      "Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.8/site-packages (from pandas) (2021.1)\n",
      "Requirement already satisfied: numpy>=1.16.5 in /opt/conda/lib/python3.8/site-packages (from pandas) (1.19.5)\n",
      "Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>Unnamed: 0.1</th>\n",
       "      <th>Form</th>\n",
       "      <th>State</th>\n",
       "      <th>Security_Grade</th>\n",
       "      <th>Area_Number</th>\n",
       "      <th>Terrain_Description</th>\n",
       "      <th>Favorable_Influences</th>\n",
       "      <th>Detrimental_Influences</th>\n",
       "      <th>INHABITANTS_Type</th>\n",
       "      <th>...</th>\n",
       "      <th>max_annual_income</th>\n",
       "      <th>terrain_rolling</th>\n",
       "      <th>white_collar</th>\n",
       "      <th>mixture_or_jewish</th>\n",
       "      <th>professional</th>\n",
       "      <th>business_or_executive</th>\n",
       "      <th>laborer</th>\n",
       "      <th>clerks</th>\n",
       "      <th>mechanics</th>\n",
       "      <th>industrial</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>NS FORM-8 6-1-37</td>\n",
       "      <td>Maryland</td>\n",
       "      <td>A</td>\n",
       "      <td>1</td>\n",
       "      <td>undulating</td>\n",
       "      <td>Very nicely planned residential area of medium...</td>\n",
       "      <td>No</td>\n",
       "      <td>executives professional men</td>\n",
       "      <td>...</td>\n",
       "      <td>5000.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>NS FORM-8 6-1-37</td>\n",
       "      <td>Maryland</td>\n",
       "      <td>A</td>\n",
       "      <td>2</td>\n",
       "      <td>rolling</td>\n",
       "      <td>Fairly new suburban area of homogeneous charac...</td>\n",
       "      <td>No</td>\n",
       "      <td>substantial middle class</td>\n",
       "      <td>...</td>\n",
       "      <td>5000.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>NS FORM-8 6-1-37</td>\n",
       "      <td>Maryland</td>\n",
       "      <td>A</td>\n",
       "      <td>3</td>\n",
       "      <td>rolling</td>\n",
       "      <td>Good residential area. Well planned.</td>\n",
       "      <td>Distance to City</td>\n",
       "      <td>executives professional men</td>\n",
       "      <td>...</td>\n",
       "      <td>7000.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3 rows × 43 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0  Unnamed: 0.1              Form     State Security_Grade  \\\n",
       "0           0             1  NS FORM-8 6-1-37  Maryland              A   \n",
       "1           1             0  NS FORM-8 6-1-37  Maryland              A   \n",
       "2           2             2  NS FORM-8 6-1-37  Maryland              A   \n",
       "\n",
       "   Area_Number Terrain_Description  \\\n",
       "0            1         undulating    \n",
       "1            2             rolling   \n",
       "2            3             rolling   \n",
       "\n",
       "                                Favorable_Influences Detrimental_Influences  \\\n",
       "0  Very nicely planned residential area of medium...                     No   \n",
       "1  Fairly new suburban area of homogeneous charac...                     No   \n",
       "2               Good residential area. Well planned.       Distance to City   \n",
       "\n",
       "              INHABITANTS_Type  ... max_annual_income terrain_rolling  \\\n",
       "0  executives professional men  ...            5000.0               1   \n",
       "1     substantial middle class  ...            5000.0               1   \n",
       "2  executives professional men  ...            7000.0               1   \n",
       "\n",
       "   white_collar mixture_or_jewish professional business_or_executive laborer  \\\n",
       "0             0                 0            1                     1       0   \n",
       "1             0                 0            0                     0       0   \n",
       "2             0                 0            1                     1       0   \n",
       "\n",
       "  clerks mechanics industrial  \n",
       "0      0         0          0  \n",
       "1      0         0          0  \n",
       "2      0         0          0  \n",
       "\n",
       "[3 rows x 43 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# loads the pandas library \n",
    "import pandas as pd\n",
    "import warnings\n",
    "warnings.simplefilter(action='ignore', category=FutureWarning)  # Ignore Pandas future warnings\n",
    "\n",
    "# creates data frame named df by reading in the Baltimore csv\n",
    "df = pd.read_csv(\"manipulated_baltimore_data.csv\")\n",
    "df.head(n=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `.describe()` function summarizes a data frame column. Since the data type of `max_building_age` is currently type 'object', which in python is an indcator of type 'string', we have to first convert this attribute into a numeric value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    46.000000\n",
       "mean     30.086957\n",
       "std      16.497577\n",
       "min      10.000000\n",
       "25%      20.000000\n",
       "50%      25.000000\n",
       "75%      40.000000\n",
       "max      65.000000\n",
       "Name: max_building_age, dtype: float64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['max_building_age'].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that `max_building_age` is numeric type, we see that `describe()` provides __summary statistics__ on this attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count    46.000000\n",
       "mean     30.086957\n",
       "std      16.497577\n",
       "min      10.000000\n",
       "25%      20.000000\n",
       "50%      25.000000\n",
       "75%      40.000000\n",
       "max      65.000000\n",
       "Name: max_building_age, dtype: float64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# converts max_building age to numeric type\n",
    "df[\"max_building_age\"] = pd.to_numeric(df[\"max_building_age\"])\n",
    "df['max_building_age'].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can the same operations to `max_annual_income`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count       46.000000\n",
       "mean      3139.130435\n",
       "std       2009.806874\n",
       "min       1000.000000\n",
       "25%       1850.000000\n",
       "50%       2750.000000\n",
       "75%       4000.000000\n",
       "max      10000.000000\n",
       "Name: max_annual_income, dtype: float64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['max_annual_income'].describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count       46.000000\n",
       "mean      3139.130435\n",
       "std       2009.806874\n",
       "min       1000.000000\n",
       "25%       1850.000000\n",
       "50%       2750.000000\n",
       "75%       4000.000000\n",
       "max      10000.000000\n",
       "Name: max_annual_income, dtype: float64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['max_annual_income'] = pd.to_numeric(df['max_annual_income'])\n",
    "df['max_annual_income'].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally we create some plots our data. A __scatter plot__ and a __bar chart__ are shown below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script type='text/javascript' src='https://10ay.online.tableau.com/javascripts/api/viz_v1.js'></script><div class='tableauPlaceholder' style='width: 1920px; height: 895px;'><object class='tableauViz' width='1920' height='895' style='display:none;'><param name='host_url' value='https%3A%2F%2F10ay.online.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='&#47;t&#47;sadata' /><param name='name' value='mapping_inequality&#47;Sheet1' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='showAppBanner' value='false' /><param name='filter' value='iframeSizedToWindow=true' /></object></div>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%HTML\n",
    "<script type='text/javascript' src='https://10ay.online.tableau.com/javascripts/api/viz_v1.js'></script><div class='tableauPlaceholder' style='width: 1920px; height: 895px;'><object class='tableauViz' width='1920' height='895' style='display:none;'><param name='host_url' value='https%3A%2F%2F10ay.online.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='&#47;t&#47;sadata' /><param name='name' value='mapping_inequality&#47;Sheet1' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='showAppBanner' value='false' /><param name='filter' value='iframeSizedToWindow=true' /></object></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 1. Hover over different points and explore their additional characteristics. __Note__:`INHABITANTS_F/N` should be multiplied by 100 to be a percent.\n",
    "2. The different points are clustered by grades. Which clusters have the most variation?\n",
    "3. How does `BUILDINGS_Construction` vary across the different points?\n",
    "4. Can you identify a trend overall? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script type='text/javascript' src='https://public.tableau.com/javascripts/api/tableau-2.min.js'></script><div class='tableauPlaceholder' style='width: 1440px; height: 715px;'><object class='tableauViz' width='1440' height='715' style='display:none;'><param name='host_url' value='https%3A%2F%2F10ay.online.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='&#47;t&#47;sadata' /><param name='name' value='mapping_inequality&#47;Sheet3' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='showAppBanner' value='false' /><param name='filter' value='iframeSizedToWindow=true' /></object></div>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%HTML\n",
    "<script type='text/javascript' src='https://public.tableau.com/javascripts/api/tableau-2.min.js'></script><div class='tableauPlaceholder' style='width: 1440px; height: 715px;'><object class='tableauViz' width='1440' height='715' style='display:none;'><param name='host_url' value='https%3A%2F%2F10ay.online.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='&#47;t&#47;sadata' /><param name='name' value='mapping_inequality&#47;Sheet3' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='showAppBanner' value='false' /><param name='filter' value='iframeSizedToWindow=true' /></object></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Excercise 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 1. Recall the preperations done to the INHABITANTS_Foreignborn, how might this have influenced these outcomes?\n",
    "2. What can you learn from this graph?\n",
    "3. What do you learn about the different grades?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<script type='text/javascript' src='https://10ay.online.tableau.com/javascripts/api/viz_v1.js'></script><div class='tableauPlaceholder' style='width: 1440px; height: 715px;'><object class='tableauViz' width='1440' height='715' style='display:none;'><param name='host_url' value='https%3A%2F%2F10ay.online.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='&#47;t&#47;sadata' /><param name='name' value='mapping_inequality&#47;Sheet2' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='showAppBanner' value='false' /><param name='filter' value='iframeSizedToWindow=true' /></object></div>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%HTML\n",
    "<script type='text/javascript' src='https://10ay.online.tableau.com/javascripts/api/viz_v1.js'></script><div class='tableauPlaceholder' style='width: 1440px; height: 715px;'><object class='tableauViz' width='1440' height='715' style='display:none;'><param name='host_url' value='https%3A%2F%2F10ay.online.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='&#47;t&#47;sadata' /><param name='name' value='mapping_inequality&#47;Sheet2' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='showAppBanner' value='false' /><param name='filter' value='iframeSizedToWindow=true' /></object></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 1. What can you learn from this graph?\n",
    "2. What are some explanations for the outcomes?\n",
    "3. What can you learn about the different grades?\n",
    "4. Compare to the previous graph, what are the similarities and differences?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}