{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exploration Methods\n",
    "When you receive a CSV file from an external source, you'll want to get a feel for the data.\n",
    "\n",
    "Let's import some data and then learn how to explore it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'data/users.csv'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Setup\n",
    "import os\n",
    "import pandas as pd\n",
    "\n",
    "# We use os.path.join because Windows uses a back slash (\\) to separate directories\n",
    "#  while others use a forward slash (/)\n",
    "users_file_name = os.path.join('data', 'users.csv')\n",
    "users_file_name"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## CSV File Exploration\n",
    "If you want to take a peek at your CSV file, you could open it in an editor. \n",
    "\n",
    "Let's just use some standard Python to see the first couple lines of the file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ",first_name,last_name,email,email_verified,signup_date,referral_count,balance\n",
      "\n",
      "aaron,Aaron,Davis,aaron6348@gmail.com,True,2018-08-31,6,18.14\n",
      "\n",
      "acook,Anthony,Cook,cook@gmail.com,True,2018-05-12,2,55.45\n",
      "\n",
      "adam.saunders,Adam,Saunders,adam@gmail.com,False,2018-05-29,3,72.12\n",
      "\n",
      "adrian,Adrian,Fang,adrian.fang@teamtreehouse.com,True,2018-04-28,3,30.01\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Open the file and print out the first 5 lines\n",
    "with open(users_file_name) as lines:\n",
    "    for _ in range(5):\n",
    "        # The `file` object is an iterator, so just get the next line \n",
    "        print(next(lines))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice how the first line is a header row. It has column names in it. By default, it will be assumed that the first row is your header row.\n",
    "\n",
    "Also note how the first column of that header row is empty...the values below in that first column appear to be usernames.  They are what we want for the index.\n",
    "\n",
    "We can use the `index_col` parameter of the `pandas.read_csv` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a new `DataFrame` and set the index to the first column\n",
    "users = pd.read_csv(users_file_name, index_col=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explore your imported DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A quick way to check and see if you got your CSV file read correctly is to use the [`DataFrame.head`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html) method. This gives you the first **x** number of rows. The head method, by default, returns 5 records. You can specify the number you want as the first argument, for instance `users.head(10)` will return the first 10 rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>first_name</th>\n",
       "      <th>last_name</th>\n",
       "      <th>email</th>\n",
       "      <th>email_verified</th>\n",
       "      <th>signup_date</th>\n",
       "      <th>referral_count</th>\n",
       "      <th>balance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>aaron</th>\n",
       "      <td>Aaron</td>\n",
       "      <td>Davis</td>\n",
       "      <td>aaron6348@gmail.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-08-31</td>\n",
       "      <td>6</td>\n",
       "      <td>18.14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acook</th>\n",
       "      <td>Anthony</td>\n",
       "      <td>Cook</td>\n",
       "      <td>cook@gmail.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-05-12</td>\n",
       "      <td>2</td>\n",
       "      <td>55.45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>adam.saunders</th>\n",
       "      <td>Adam</td>\n",
       "      <td>Saunders</td>\n",
       "      <td>adam@gmail.com</td>\n",
       "      <td>False</td>\n",
       "      <td>2018-05-29</td>\n",
       "      <td>3</td>\n",
       "      <td>72.12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>adrian</th>\n",
       "      <td>Adrian</td>\n",
       "      <td>Fang</td>\n",
       "      <td>adrian.fang@teamtreehouse.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-04-28</td>\n",
       "      <td>3</td>\n",
       "      <td>30.01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>adrian.blair</th>\n",
       "      <td>Adrian</td>\n",
       "      <td>Blair</td>\n",
       "      <td>adrian9335@gmail.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-06-16</td>\n",
       "      <td>7</td>\n",
       "      <td>25.85</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              first_name last_name                          email  \\\n",
       "aaron              Aaron     Davis            aaron6348@gmail.com   \n",
       "acook            Anthony      Cook                 cook@gmail.com   \n",
       "adam.saunders       Adam  Saunders                 adam@gmail.com   \n",
       "adrian            Adrian      Fang  adrian.fang@teamtreehouse.com   \n",
       "adrian.blair      Adrian     Blair           adrian9335@gmail.com   \n",
       "\n",
       "               email_verified signup_date  referral_count  balance  \n",
       "aaron                    True  2018-08-31               6    18.14  \n",
       "acook                    True  2018-05-12               2    55.45  \n",
       "adam.saunders           False  2018-05-29               3    72.12  \n",
       "adrian                   True  2018-04-28               3    30.01  \n",
       "adrian.blair             True  2018-06-16               7    25.85  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nice! We got it.  So let's see how many rows we have.  There are a couple of ways.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "475"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Pythonic approach still works\n",
    "len(users)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Side note*: This length call is quick. Under the covers this `DataFrame.__len__` call is actually performing a `len(df.index)`, counting the rows by using the index. You might see older code that uses the style of `len(df.index)` to get a count of rows. As of pandas version 0.11, `len(df)` is the same as `len(df.index)`.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `DataFrame.shape` property works just like `np.array.shape` does. This is the length of each axis of your data frame, rows and columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(475, 7)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exploring from a bird's eye view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Counts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [`DataFrame.count`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.count.html) method will count up each column for how many non-empty values we have."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "first_name        475\n",
       "last_name         430\n",
       "email             475\n",
       "email_verified    475\n",
       "signup_date       475\n",
       "referral_count    475\n",
       "balance           475\n",
       "dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Most of our columns include values for each row, but looks like **last_name** has some missing ones.  The missing data will show up as `np.nan` -- NumPy's not a number -- in those records.\n",
    "\n",
    "\n",
    "The `count` method is data missing aware."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remember that a `DataFrame` has the ability to contain multiple data types or [`dtypes`](https://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dtypes).\n",
    "\n",
    "You can use the `DataFrame.dtypes` to see the `dtype` of each column."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Data types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "first_name         object\n",
       "last_name          object\n",
       "email              object\n",
       "email_verified       bool\n",
       "signup_date        object\n",
       "referral_count      int64\n",
       "balance           float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, most of the data types of these columns were inferred, or assumed, correctly.  See how automatically **`email_verified`** is `bool`, **`referral_count`** is an integer, and **`balance`** a float. This happened when we used `pd.read_csv`. \n",
    "\n",
    "One thing to note though that the **`signup_date`** field is an `object` and not a `datetime`. You can convert these during or after import if you need to, and we'll do some of that later in this course."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Describe your data\n",
    "The [`DataFrame.describe`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html) method is a great way to get a vibe for all numeric data in your `DataFrame`.  You'll notice only columns that have numeric data are returned, and ones with booleans or text, like **`email_verified`** and **`first_name`** are left out. \n",
    "\n",
    "\n",
    "You'll see lots of different aggregations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>referral_count</th>\n",
       "      <th>balance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>475.000000</td>\n",
       "      <td>475.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>3.429474</td>\n",
       "      <td>49.933263</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>2.281085</td>\n",
       "      <td>28.280448</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.050000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>2.000000</td>\n",
       "      <td>25.305000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>3.000000</td>\n",
       "      <td>51.570000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>5.000000</td>\n",
       "      <td>74.480000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>7.000000</td>\n",
       "      <td>99.900000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       referral_count     balance\n",
       "count      475.000000  475.000000\n",
       "mean         3.429474   49.933263\n",
       "std          2.281085   28.280448\n",
       "min          0.000000    0.050000\n",
       "25%          2.000000   25.305000\n",
       "50%          3.000000   51.570000\n",
       "75%          5.000000   74.480000\n",
       "max          7.000000   99.900000"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Most of these aggregations are available by themselves as well"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "email_verified     0.818947\n",
       "referral_count     3.429474\n",
       "balance           49.933263\n",
       "dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The mean or average\n",
    "users.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "email_verified     0.385468\n",
       "referral_count     2.281085\n",
       "balance           28.280448\n",
       "dtype: float64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Standard deviation\n",
    "users.std()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "first_name                       Aaron\n",
       "email             aalvarez@hotmail.com\n",
       "email_verified                   False\n",
       "signup_date                 2018-01-01\n",
       "referral_count                       0\n",
       "balance                           0.05\n",
       "dtype: object"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The minimum of each column\n",
    "users.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "first_name                Zachary\n",
       "email             zneal@gmail.com\n",
       "email_verified               True\n",
       "signup_date            2018-09-25\n",
       "referral_count                  7\n",
       "balance                      99.9\n",
       "dtype: object"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The maximum of each column\n",
    "users.max()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since columns are in reality a `Series` you can quickly access their counts of different values using the [`value_counts`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html) method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True     389\n",
       "False     86\n",
       "Name: email_verified, dtype: int64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.email_verified.value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default the value counts are sorted descending, so the most frequent are at top."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Mark           11\n",
       "David          10\n",
       "Michael         9\n",
       "Jennifer        7\n",
       "Christopher     7\n",
       "Name: first_name, dtype: int64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Most common first name\n",
    "users.first_name.value_counts().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Rearranging your data\n",
    "You can create a new `DataFrame` that is sorted by using the [`sort_values`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html) method.\n",
    "\n",
    "Let's sort the DataFrame so that the user with the highest **`balance`** is at the top. By default, ascending order is assumed, you can change that by setting the `ascending` keyword argument to `False`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>first_name</th>\n",
       "      <th>last_name</th>\n",
       "      <th>email</th>\n",
       "      <th>email_verified</th>\n",
       "      <th>signup_date</th>\n",
       "      <th>referral_count</th>\n",
       "      <th>balance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>twhite</th>\n",
       "      <td>Timothy</td>\n",
       "      <td>White</td>\n",
       "      <td>white5136@hotmail.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-07-06</td>\n",
       "      <td>5</td>\n",
       "      <td>99.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>karen.snow</th>\n",
       "      <td>Karen</td>\n",
       "      <td>Snow</td>\n",
       "      <td>ksnow@yahoo.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-05-06</td>\n",
       "      <td>2</td>\n",
       "      <td>99.38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>king</th>\n",
       "      <td>Billy</td>\n",
       "      <td>King</td>\n",
       "      <td>billy.king@hotmail.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-05-29</td>\n",
       "      <td>4</td>\n",
       "      <td>98.80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>king3246</th>\n",
       "      <td>Brittney</td>\n",
       "      <td>King</td>\n",
       "      <td>brittney@yahoo.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-04-15</td>\n",
       "      <td>6</td>\n",
       "      <td>98.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>crane203</th>\n",
       "      <td>Valerie</td>\n",
       "      <td>Crane</td>\n",
       "      <td>valerie7051@hotmail.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-05-12</td>\n",
       "      <td>3</td>\n",
       "      <td>98.69</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           first_name last_name                    email  email_verified  \\\n",
       "twhite        Timothy     White    white5136@hotmail.com            True   \n",
       "karen.snow      Karen      Snow          ksnow@yahoo.com            True   \n",
       "king            Billy      King   billy.king@hotmail.com            True   \n",
       "king3246     Brittney      King       brittney@yahoo.com            True   \n",
       "crane203      Valerie     Crane  valerie7051@hotmail.com            True   \n",
       "\n",
       "           signup_date  referral_count  balance  \n",
       "twhite      2018-07-06               5    99.90  \n",
       "karen.snow  2018-05-06               2    99.38  \n",
       "king        2018-05-29               4    98.80  \n",
       "king3246    2018-04-15               6    98.79  \n",
       "crane203    2018-05-12               3    98.69  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.sort_values(by='balance', ascending=False).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You'll notice that `sort_values` call actually created a new `DataFrame`. If you want to permanently change the sort from the default (sorted by index), you can pass `True` as an argument to the `inplace` keyword parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>first_name</th>\n",
       "      <th>last_name</th>\n",
       "      <th>email</th>\n",
       "      <th>email_verified</th>\n",
       "      <th>signup_date</th>\n",
       "      <th>referral_count</th>\n",
       "      <th>balance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>darlene.adams</th>\n",
       "      <td>Darlene</td>\n",
       "      <td>Adams</td>\n",
       "      <td>adams@hotmail.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-09-15</td>\n",
       "      <td>2</td>\n",
       "      <td>67.02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>lauren</th>\n",
       "      <td>Lauren</td>\n",
       "      <td>Aguilar</td>\n",
       "      <td>lauren.aguilar@summers.com</td>\n",
       "      <td>False</td>\n",
       "      <td>2018-05-31</td>\n",
       "      <td>4</td>\n",
       "      <td>69.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>daniel</th>\n",
       "      <td>Daniel</td>\n",
       "      <td>Allen</td>\n",
       "      <td>allen@hotmail.com</td>\n",
       "      <td>False</td>\n",
       "      <td>2018-07-01</td>\n",
       "      <td>2</td>\n",
       "      <td>21.21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>kallen</th>\n",
       "      <td>Kathy</td>\n",
       "      <td>Allen</td>\n",
       "      <td>kathy@hotmail.com</td>\n",
       "      <td>False</td>\n",
       "      <td>2018-02-20</td>\n",
       "      <td>1</td>\n",
       "      <td>43.72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>alvarado</th>\n",
       "      <td>Denise</td>\n",
       "      <td>Alvarado</td>\n",
       "      <td>alvarado@hotmail.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-09-07</td>\n",
       "      <td>6</td>\n",
       "      <td>26.72</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              first_name last_name                       email  \\\n",
       "darlene.adams    Darlene     Adams           adams@hotmail.com   \n",
       "lauren            Lauren   Aguilar  lauren.aguilar@summers.com   \n",
       "daniel            Daniel     Allen           allen@hotmail.com   \n",
       "kallen             Kathy     Allen           kathy@hotmail.com   \n",
       "alvarado          Denise  Alvarado        alvarado@hotmail.com   \n",
       "\n",
       "               email_verified signup_date  referral_count  balance  \n",
       "darlene.adams            True  2018-09-15               2    67.02  \n",
       "lauren                  False  2018-05-31               4    69.90  \n",
       "daniel                  False  2018-07-01               2    21.21  \n",
       "kallen                  False  2018-02-20               1    43.72  \n",
       "alvarado                 True  2018-09-07               6    26.72  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Sort first by last_name and then first_name. By default, np.nan show up at the end\n",
    "users.sort_values(by=['last_name', 'first_name'], inplace=True)\n",
    "# Sort order is now changed\n",
    "users.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And if you want to sort by the index, like it was originally, you can use the `sort_index` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>first_name</th>\n",
       "      <th>last_name</th>\n",
       "      <th>email</th>\n",
       "      <th>email_verified</th>\n",
       "      <th>signup_date</th>\n",
       "      <th>referral_count</th>\n",
       "      <th>balance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>aaron</th>\n",
       "      <td>Aaron</td>\n",
       "      <td>Davis</td>\n",
       "      <td>aaron6348@gmail.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-08-31</td>\n",
       "      <td>6</td>\n",
       "      <td>18.14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>acook</th>\n",
       "      <td>Anthony</td>\n",
       "      <td>Cook</td>\n",
       "      <td>cook@gmail.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-05-12</td>\n",
       "      <td>2</td>\n",
       "      <td>55.45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>adam.saunders</th>\n",
       "      <td>Adam</td>\n",
       "      <td>Saunders</td>\n",
       "      <td>adam@gmail.com</td>\n",
       "      <td>False</td>\n",
       "      <td>2018-05-29</td>\n",
       "      <td>3</td>\n",
       "      <td>72.12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>adrian</th>\n",
       "      <td>Adrian</td>\n",
       "      <td>Fang</td>\n",
       "      <td>adrian.fang@teamtreehouse.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-04-28</td>\n",
       "      <td>3</td>\n",
       "      <td>30.01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>adrian.blair</th>\n",
       "      <td>Adrian</td>\n",
       "      <td>Blair</td>\n",
       "      <td>adrian9335@gmail.com</td>\n",
       "      <td>True</td>\n",
       "      <td>2018-06-16</td>\n",
       "      <td>7</td>\n",
       "      <td>25.85</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              first_name last_name                          email  \\\n",
       "aaron              Aaron     Davis            aaron6348@gmail.com   \n",
       "acook            Anthony      Cook                 cook@gmail.com   \n",
       "adam.saunders       Adam  Saunders                 adam@gmail.com   \n",
       "adrian            Adrian      Fang  adrian.fang@teamtreehouse.com   \n",
       "adrian.blair      Adrian     Blair           adrian9335@gmail.com   \n",
       "\n",
       "               email_verified signup_date  referral_count  balance  \n",
       "aaron                    True  2018-08-31               6    18.14  \n",
       "acook                    True  2018-05-12               2    55.45  \n",
       "adam.saunders           False  2018-05-29               3    72.12  \n",
       "adrian                   True  2018-04-28               3    30.01  \n",
       "adrian.blair             True  2018-06-16               7    25.85  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users.sort_index(inplace=True)\n",
    "users.head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}