{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chapter 3\n",
    "---\n",
    "# Data Wrangling\n",
    "\n",
    "### 3.0 Introduction\n",
    "Data wrangling is a broad term used, often informally, to describe the process of transforming raw data to a clean and organized format ready for use.\n",
    "\n",
    "The most common data structure used to \"wrangle\" data is the data frame, which can be both intuitive and incredibly versatile. Data frames are tabular, meaning that htey are based on rows and columns like you'd find in a spreadsheet\n",
    "\n",
    "### 3.1 Creating a Data Frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "dataframe = pd.DataFrame()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Describing the Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                           Name PClass   Age     Sex  Survived  SexCode\n",
      "0  Allen, Miss Elisabeth Walton    1st  29.0  female         1        1\n",
      "1   Allison, Miss Helen Loraine    1st   2.0  female         0        1\n",
      "Dimensions: (1313, 6)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>756.000000</td>\n",
       "      <td>1313.000000</td>\n",
       "      <td>1313.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>30.397989</td>\n",
       "      <td>0.342727</td>\n",
       "      <td>0.351866</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>14.259049</td>\n",
       "      <td>0.474802</td>\n",
       "      <td>0.477734</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.170000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>21.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>28.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>39.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>71.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Age     Survived      SexCode\n",
       "count  756.000000  1313.000000  1313.000000\n",
       "mean    30.397989     0.342727     0.351866\n",
       "std     14.259049     0.474802     0.477734\n",
       "min      0.170000     0.000000     0.000000\n",
       "25%     21.000000     0.000000     0.000000\n",
       "50%     28.000000     0.000000     0.000000\n",
       "75%     39.000000     1.000000     1.000000\n",
       "max     71.000000     1.000000     1.000000"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "url = \"http://tinyurl.com/titanic-csv\"\n",
    "df = pd.read_csv(url)\n",
    "# show first two rows\n",
    "print(df.head(2)) # also try tail(2) for last two rows\n",
    "\n",
    "# show dimensions\n",
    "print(\"Dimensions: {}\".format(df.shape))\n",
    "\n",
    "# show statistics\n",
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 Navigating DataFrames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Name        Allen, Miss Elisabeth Walton\n",
      "PClass                               1st\n",
      "Age                                   29\n",
      "Sex                               female\n",
      "Survived                               1\n",
      "SexCode                                1\n",
      "Name: 0, dtype: object\n",
      "                                            Name PClass   Age     Sex  \\\n",
      "1                    Allison, Miss Helen Loraine    1st   2.0  female   \n",
      "2            Allison, Mr Hudson Joshua Creighton    1st  30.0    male   \n",
      "3  Allison, Mrs Hudson JC (Bessie Waldo Daniels)    1st  25.0  female   \n",
      "\n",
      "   Survived  SexCode  \n",
      "1         0        1  \n",
      "2         0        0  \n",
      "3         0        1  \n",
      "                                            Name PClass   Age     Sex  \\\n",
      "0                   Allen, Miss Elisabeth Walton    1st  29.0  female   \n",
      "1                    Allison, Miss Helen Loraine    1st   2.0  female   \n",
      "2            Allison, Mr Hudson Joshua Creighton    1st  30.0    male   \n",
      "3  Allison, Mrs Hudson JC (Bessie Waldo Daniels)    1st  25.0  female   \n",
      "\n",
      "   Survived  SexCode  \n",
      "0         1        1  \n",
      "1         0        1  \n",
      "2         0        0  \n",
      "3         0        1  \n"
     ]
    }
   ],
   "source": [
    "# select the first row\n",
    "print(df.iloc[0])\n",
    "\n",
    "# select three rows\n",
    "print(df.iloc[1:4])\n",
    "\n",
    "# all rows up to and including the fourth row\n",
    "print(df.iloc[:4])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrames do not need to be numerically indexed. We can set the index of a DataFrame to any value where the value is unique to each row. For example, we can set the index to be passenger names and then select rows using a name:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Name        Allen, Miss Elisabeth Walton\n",
       "PClass                               1st\n",
       "Age                                   29\n",
       "Sex                               female\n",
       "Survived                               1\n",
       "SexCode                                1\n",
       "Name: Allen, Miss Elisabeth Walton, dtype: object"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# set index\n",
    "df = df.set_index(df['Name'])\n",
    "\n",
    "# show row\n",
    "df.loc['Allen, Miss Elisabeth Walton']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Discussion\n",
    "To select individual rows and slices of rows, pandas provides two methods:\n",
    "* `loc` is useful when the index of the DataFrame is a label (a string)\n",
    "* `iloc` works by looking for the position in the DataFrame. For exmaple, iloc[0] will return the first row regardless of whether the index is an integer or a label\n",
    "\n",
    "## 3.4 Selecting Rows Based on Conditionals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>PClass</th>\n",
       "      <th>Age</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Allen, Miss Elisabeth Walton</th>\n",
       "      <td>Allen, Miss Elisabeth Walton</td>\n",
       "      <td>1st</td>\n",
       "      <td>29.0</td>\n",
       "      <td>female</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Allison, Miss Helen Loraine</th>\n",
       "      <td>Allison, Miss Helen Loraine</td>\n",
       "      <td>1st</td>\n",
       "      <td>2.0</td>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                      Name PClass   Age  \\\n",
       "Name                                                                      \n",
       "Allen, Miss Elisabeth Walton  Allen, Miss Elisabeth Walton    1st  29.0   \n",
       "Allison, Miss Helen Loraine    Allison, Miss Helen Loraine    1st   2.0   \n",
       "\n",
       "                                 Sex  Survived  SexCode  \n",
       "Name                                                     \n",
       "Allen, Miss Elisabeth Walton  female         1        1  \n",
       "Allison, Miss Helen Loraine   female         0        1  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select top two rows where column 'sex' is 'female'\n",
    "df[df['Sex'] == 'female'].head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>PClass</th>\n",
       "      <th>Age</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Crosby, Mrs Edward Gifford (Catherine Elizabeth Halstead)</th>\n",
       "      <td>Crosby, Mrs Edward Gifford (Catherine Elizabet...</td>\n",
       "      <td>1st</td>\n",
       "      <td>69.0</td>\n",
       "      <td>female</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                 Name  \\\n",
       "Name                                                                                                    \n",
       "Crosby, Mrs Edward Gifford (Catherine Elizabeth...  Crosby, Mrs Edward Gifford (Catherine Elizabet...   \n",
       "\n",
       "                                                   PClass   Age     Sex  \\\n",
       "Name                                                                      \n",
       "Crosby, Mrs Edward Gifford (Catherine Elizabeth...    1st  69.0  female   \n",
       "\n",
       "                                                    Survived  SexCode  \n",
       "Name                                                                   \n",
       "Crosby, Mrs Edward Gifford (Catherine Elizabeth...         1        1  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# multiple conditions\n",
    "df[(df['Sex'] == 'female') & (df['Age'] >= 65)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.5 Replacing Values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Name\n",
       "Allen, Miss Elisabeth Walton    Woman\n",
       "Allison, Miss Helen Loraine     Woman\n",
       "Name: Sex, dtype: object"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# replace any instance of 'female' with Woman\n",
    "df['Sex'].replace('female', 'Woman').head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Name\n",
       "Allen, Miss Elisabeth Walton                     Woman\n",
       "Allison, Miss Helen Loraine                      Woman\n",
       "Allison, Mr Hudson Joshua Creighton                Man\n",
       "Allison, Mrs Hudson JC (Bessie Waldo Daniels)    Woman\n",
       "Allison, Master Hudson Trevor                      Man\n",
       "Name: Sex, dtype: object"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# replace any instance of 'female' with Woman\n",
    "df['Sex'].replace(['female', 'male'], ['Woman', 'Man']).head(5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>PClass</th>\n",
       "      <th>Age</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Allen, Miss Elisabeth Walton</th>\n",
       "      <td>Allen, Miss Elisabeth Walton</td>\n",
       "      <td>1st</td>\n",
       "      <td>29</td>\n",
       "      <td>female</td>\n",
       "      <td>One</td>\n",
       "      <td>One</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Allison, Miss Helen Loraine</th>\n",
       "      <td>Allison, Miss Helen Loraine</td>\n",
       "      <td>1st</td>\n",
       "      <td>2</td>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "      <td>One</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                      Name PClass Age     Sex  \\\n",
       "Name                                                                            \n",
       "Allen, Miss Elisabeth Walton  Allen, Miss Elisabeth Walton    1st  29  female   \n",
       "Allison, Miss Helen Loraine    Allison, Miss Helen Loraine    1st   2  female   \n",
       "\n",
       "                             Survived SexCode  \n",
       "Name                                           \n",
       "Allen, Miss Elisabeth Walton      One     One  \n",
       "Allison, Miss Helen Loraine         0     One  "
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.replace(1, \"One\").head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.6 Renaming Columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Passenger Class</th>\n",
       "      <th>Age</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Allen, Miss Elisabeth Walton</th>\n",
       "      <td>Allen, Miss Elisabeth Walton</td>\n",
       "      <td>1st</td>\n",
       "      <td>29.0</td>\n",
       "      <td>female</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Allison, Miss Helen Loraine</th>\n",
       "      <td>Allison, Miss Helen Loraine</td>\n",
       "      <td>1st</td>\n",
       "      <td>2.0</td>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                      Name Passenger Class  \\\n",
       "Name                                                                         \n",
       "Allen, Miss Elisabeth Walton  Allen, Miss Elisabeth Walton             1st   \n",
       "Allison, Miss Helen Loraine    Allison, Miss Helen Loraine             1st   \n",
       "\n",
       "                               Age     Sex  Survived  SexCode  \n",
       "Name                                                           \n",
       "Allen, Miss Elisabeth Walton  29.0  female         1        1  \n",
       "Allison, Miss Helen Loraine    2.0  female         0        1  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.rename(columns={'PClass': 'Passenger Class'}).head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Passenger Class</th>\n",
       "      <th>Age</th>\n",
       "      <th>Gender</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Allen, Miss Elisabeth Walton</th>\n",
       "      <td>Allen, Miss Elisabeth Walton</td>\n",
       "      <td>1st</td>\n",
       "      <td>29.0</td>\n",
       "      <td>female</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Allison, Miss Helen Loraine</th>\n",
       "      <td>Allison, Miss Helen Loraine</td>\n",
       "      <td>1st</td>\n",
       "      <td>2.0</td>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                      Name Passenger Class  \\\n",
       "Name                                                                         \n",
       "Allen, Miss Elisabeth Walton  Allen, Miss Elisabeth Walton             1st   \n",
       "Allison, Miss Helen Loraine    Allison, Miss Helen Loraine             1st   \n",
       "\n",
       "                               Age  Gender  Survived  SexCode  \n",
       "Name                                                           \n",
       "Allen, Miss Elisabeth Walton  29.0  female         1        1  \n",
       "Allison, Miss Helen Loraine    2.0  female         0        1  "
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.rename(columns={'PClass': 'Passenger Class', 'Sex': 'Gender'}).head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.7 Finding the Min, Max, Sum, Average, and Count"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Maximum: 71.0\n",
      "Minimum: 0.17\n",
      "Mean: 30.397989417989418\n",
      "Sum: 22980.88\n",
      "Count: 756\n"
     ]
    }
   ],
   "source": [
    "print('Maximum: {}'.format(df['Age'].max()))\n",
    "print('Minimum: {}'.format(df['Age'].min()))\n",
    "print('Mean: {}'.format(df['Age'].mean()))\n",
    "print('Sum: {}'.format(df['Age'].sum()))\n",
    "print('Count: {}'.format(df['Age'].count()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to these, pandas also offers variance (`var`), standard deviation (`std`), kurtosis (`kurt`), skewness (`skew`), and a number of others.\n",
    "\n",
    "We can also apply these methods to whole dataframes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variance: Age         203.320470\n",
      "Survived      0.225437\n",
      "SexCode       0.228230\n",
      "dtype: float64\n",
      "Standard Deviation: Age         14.259049\n",
      "Survived     0.474802\n",
      "SexCode      0.477734\n",
      "dtype: float64\n",
      "Kurtosis: Age        -0.036536\n",
      "Survived   -1.562162\n",
      "SexCode    -1.616702\n",
      "dtype: float64\n",
      "Skewness: Age         0.368511\n",
      "Survived    0.663491\n",
      "SexCode     0.621098\n",
      "dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print(\"Variance: {}\".format(df.var()))\n",
    "print(\"Standard Deviation: {}\".format(df.std()))\n",
    "print(\"Kurtosis: {}\".format(df.kurt()))\n",
    "print(\"Skewness: {}\".format(df.skew()))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.8 Finding Unique Values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['female', 'male'], dtype=object)"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# unique will return an array of all unique values in a column\n",
    "df['Sex'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "male      851\n",
       "female    462\n",
       "Name: Sex, dtype: int64"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# value_counts will display all unique values with the number of times each value appears\n",
    "df['Sex'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.9 Handling Missing Values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>PClass</th>\n",
       "      <th>Age</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Aubert, Mrs Leontine Pauline</th>\n",
       "      <td>Aubert, Mrs Leontine Pauline</td>\n",
       "      <td>1st</td>\n",
       "      <td>NaN</td>\n",
       "      <td>female</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Barkworth, Mr Algernon H</th>\n",
       "      <td>Barkworth, Mr Algernon H</td>\n",
       "      <td>1st</td>\n",
       "      <td>NaN</td>\n",
       "      <td>male</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                      Name PClass  Age  \\\n",
       "Name                                                                     \n",
       "Aubert, Mrs Leontine Pauline  Aubert, Mrs Leontine Pauline    1st  NaN   \n",
       "Barkworth, Mr Algernon H          Barkworth, Mr Algernon H    1st  NaN   \n",
       "\n",
       "                                 Sex  Survived  SexCode  \n",
       "Name                                                     \n",
       "Aubert, Mrs Leontine Pauline  female         1        1  \n",
       "Barkworth, Mr Algernon H        male         1        0  "
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# select missing values, show 2 rows\n",
    "df[df['Age'].isnull()].head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.10 Deleting a Column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>PClass</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Allen, Miss Elisabeth Walton</th>\n",
       "      <td>Allen, Miss Elisabeth Walton</td>\n",
       "      <td>1st</td>\n",
       "      <td>female</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Allison, Miss Helen Loraine</th>\n",
       "      <td>Allison, Miss Helen Loraine</td>\n",
       "      <td>1st</td>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                      Name PClass     Sex  \\\n",
       "Name                                                                        \n",
       "Allen, Miss Elisabeth Walton  Allen, Miss Elisabeth Walton    1st  female   \n",
       "Allison, Miss Helen Loraine    Allison, Miss Helen Loraine    1st  female   \n",
       "\n",
       "                              Survived  SexCode  \n",
       "Name                                             \n",
       "Allen, Miss Elisabeth Walton         1        1  \n",
       "Allison, Miss Helen Loraine          0        1  "
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# axis=1 means the column axis\n",
    "df.drop('Age', axis=1).head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.11 Deleting a Row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>PClass</th>\n",
       "      <th>Age</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Allen, Miss Elisabeth Walton</th>\n",
       "      <td>Allen, Miss Elisabeth Walton</td>\n",
       "      <td>1st</td>\n",
       "      <td>29.0</td>\n",
       "      <td>female</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Allison, Miss Helen Loraine</th>\n",
       "      <td>Allison, Miss Helen Loraine</td>\n",
       "      <td>1st</td>\n",
       "      <td>2.0</td>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                      Name PClass   Age  \\\n",
       "Name                                                                      \n",
       "Allen, Miss Elisabeth Walton  Allen, Miss Elisabeth Walton    1st  29.0   \n",
       "Allison, Miss Helen Loraine    Allison, Miss Helen Loraine    1st   2.0   \n",
       "\n",
       "                                 Sex  Survived  SexCode  \n",
       "Name                                                     \n",
       "Allen, Miss Elisabeth Walton  female         1        1  \n",
       "Allison, Miss Helen Loraine   female         0        1  "
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create new dataframe excluding the rows you want to delete\n",
    "df[df['Sex'] != 'male'].head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>PClass</th>\n",
       "      <th>Age</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Allen, Miss Elisabeth Walton</th>\n",
       "      <td>Allen, Miss Elisabeth Walton</td>\n",
       "      <td>1st</td>\n",
       "      <td>29.0</td>\n",
       "      <td>female</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Allison, Mr Hudson Joshua Creighton</th>\n",
       "      <td>Allison, Mr Hudson Joshua Creighton</td>\n",
       "      <td>1st</td>\n",
       "      <td>30.0</td>\n",
       "      <td>male</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                    Name  \\\n",
       "Name                                                                       \n",
       "Allen, Miss Elisabeth Walton                Allen, Miss Elisabeth Walton   \n",
       "Allison, Mr Hudson Joshua Creighton  Allison, Mr Hudson Joshua Creighton   \n",
       "\n",
       "                                    PClass   Age     Sex  Survived  SexCode  \n",
       "Name                                                                         \n",
       "Allen, Miss Elisabeth Walton           1st  29.0  female         1        1  \n",
       "Allison, Mr Hudson Joshua Creighton    1st  30.0    male         0        0  "
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# delete a row by matching a unique value\n",
    "df[df['Name'] != 'Allison, Miss Helen Loraine'].head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>PClass</th>\n",
       "      <th>Age</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Allen, Miss Elisabeth Walton</th>\n",
       "      <td>Allen, Miss Elisabeth Walton</td>\n",
       "      <td>1st</td>\n",
       "      <td>29.0</td>\n",
       "      <td>female</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Allison, Miss Helen Loraine</th>\n",
       "      <td>Allison, Miss Helen Loraine</td>\n",
       "      <td>1st</td>\n",
       "      <td>2.0</td>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                      Name PClass   Age  \\\n",
       "Name                                                                      \n",
       "Allen, Miss Elisabeth Walton  Allen, Miss Elisabeth Walton    1st  29.0   \n",
       "Allison, Miss Helen Loraine    Allison, Miss Helen Loraine    1st   2.0   \n",
       "\n",
       "                                 Sex  Survived  SexCode  \n",
       "Name                                                     \n",
       "Allen, Miss Elisabeth Walton  female         1        1  \n",
       "Allison, Miss Helen Loraine   female         0        1  "
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# delete a row by index\n",
    "df[df.index != 0].head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.13 Grouping Rows by Values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>29.396424</td>\n",
       "      <td>0.666667</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>31.014338</td>\n",
       "      <td>0.166863</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Age  Survived  SexCode\n",
       "Sex                                 \n",
       "female  29.396424  0.666667      1.0\n",
       "male    31.014338  0.166863      0.0"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Sex').mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Survived\n",
       "0    863\n",
       "1    450\n",
       "Name: Name, dtype: int64"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Survived')['Name'].count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Sex     Survived\n",
       "female  0           24.901408\n",
       "        1           30.867143\n",
       "male    0           32.320780\n",
       "        1           25.951875\n",
       "Name: Age, dtype: float64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby(['Sex', 'Survived'])['Age'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.14 Grouping Rows by Time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.15 Looping Over a Column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ALLEN, MISS ELISABETH WALTON\n",
      "ALLISON, MISS HELEN LORAINE\n"
     ]
    }
   ],
   "source": [
    "# for .. in .. loop\n",
    "for name in df['Name'][0:2]:\n",
    "    print(name.upper())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['ALLEN, MISS ELISABETH WALTON', 'ALLISON, MISS HELEN LORAINE']"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# list comprehension (more \"pythonic\")\n",
    "[name.upper() for name in df['Name'][0:2]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.16 Applying a Function Over All Elements in a Column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Name\n",
       "Allen, Miss Elisabeth Walton    ALLEN, MISS ELISABETH WALTON\n",
       "Allison, Miss Helen Loraine      ALLISON, MISS HELEN LORAINE\n",
       "Name: Name, dtype: object"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def uppercase(x):\n",
    "    return x.upper()\n",
    "\n",
    "df['Name'].apply(uppercase)[0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Discussion\n",
    "`apply` is a great way to do data cleaning and wrangling. It is common to write a function to perform some useful operation (separate first and last names, convert string to floats, etc) and then map that funtion to every element in a column.\n",
    "\n",
    "## 3.17 Applying a Function to Groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>PClass</th>\n",
       "      <th>Age</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Survived</th>\n",
       "      <th>SexCode</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>462</td>\n",
       "      <td>462</td>\n",
       "      <td>288</td>\n",
       "      <td>462</td>\n",
       "      <td>462</td>\n",
       "      <td>462</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>851</td>\n",
       "      <td>851</td>\n",
       "      <td>468</td>\n",
       "      <td>851</td>\n",
       "      <td>851</td>\n",
       "      <td>851</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Name  PClass  Age  Sex  Survived  SexCode\n",
       "Sex                                              \n",
       "female   462     462  288  462       462      462\n",
       "male     851     851  468  851       851      851"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('Sex').apply(lambda x: x.count())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By combining `groupby` and `apply` we can calculate custom statistics or apply any function to each group separately\n",
    "\n",
    "## 3.18 Concatenating DataFrames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### See Also\n",
    "* A Visual Explanation of SQL Joins (https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/)\n",
    "* pandas' documentation on merging (https://pandas.pydata.org/pandas-docs/stable/merging.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:machine_learning_cookbook]",
   "language": "python",
   "name": "conda-env-machine_learning_cookbook-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}