{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Combining DataFrames\n",
    "\n",
    "CashBox has provided us with several separate CSV files. Let's take a look at two files `transactions.csv` and `requests.csv`. Requests are made in the application when one user requests cash from another. Requests are not required for a transaction to occur. \n",
    "\n",
    "Let's see if we can't see get a feeling on how many successful requests and payments have been made. In order to do this we will need to combine the two `DataFrame`s."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((998, 4), (313, 4))"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Setup\n",
    "from datetime import datetime\n",
    "import os\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "from utils import render\n",
    "\n",
    "transactions = pd.read_csv(os.path.join('data', 'transactions.csv'), index_col=0)\n",
    "requests = pd.read_csv(os.path.join('data', 'requests.csv'), index_col=0)\n",
    "\n",
    "# Pop out a quick sanity check\n",
    "(transactions.shape, requests.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's explore our data real quick, by taking a look at the first couple of rows in each `DataFrame`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sender</th>\n",
       "      <th>receiver</th>\n",
       "      <th>amount</th>\n",
       "      <th>sent_date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>stein</td>\n",
       "      <td>smoyer</td>\n",
       "      <td>49.03</td>\n",
       "      <td>2018-01-24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>holden4580</td>\n",
       "      <td>joshua.henry</td>\n",
       "      <td>34.64</td>\n",
       "      <td>2018-02-06</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       sender      receiver  amount   sent_date\n",
       "0       stein        smoyer   49.03  2018-01-24\n",
       "1  holden4580  joshua.henry   34.64  2018-02-06"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transactions.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>from_user</th>\n",
       "      <th>to_user</th>\n",
       "      <th>amount</th>\n",
       "      <th>request_date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>chad.chen</td>\n",
       "      <td>paula7980</td>\n",
       "      <td>78.61</td>\n",
       "      <td>2018-02-12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>kallen</td>\n",
       "      <td>lmoore</td>\n",
       "      <td>1.94</td>\n",
       "      <td>2018-02-23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   from_user    to_user  amount request_date\n",
       "0  chad.chen  paula7980   78.61   2018-02-12\n",
       "1     kallen     lmoore    1.94   2018-02-23"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "requests.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I'd like to see all the requests that have a matching transaction based on the users and the amount involved.\n",
    "\n",
    "In order to do this we will merge both of our datasets together.  \n",
    "\n",
    "We'll create a new dataset by using the [`DataFrame.merge`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>from_user</th>\n",
       "      <th>to_user</th>\n",
       "      <th>amount</th>\n",
       "      <th>request_date</th>\n",
       "      <th>sender</th>\n",
       "      <th>receiver</th>\n",
       "      <th>sent_date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>chad.chen</td>\n",
       "      <td>paula7980</td>\n",
       "      <td>78.61</td>\n",
       "      <td>2018-02-12</td>\n",
       "      <td>paula7980</td>\n",
       "      <td>chad.chen</td>\n",
       "      <td>2018-07-15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>kallen</td>\n",
       "      <td>lmoore</td>\n",
       "      <td>1.94</td>\n",
       "      <td>2018-02-23</td>\n",
       "      <td>lmoore</td>\n",
       "      <td>kallen</td>\n",
       "      <td>2018-03-05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>gregory.blackwell</td>\n",
       "      <td>rodriguez5768</td>\n",
       "      <td>30.57</td>\n",
       "      <td>2018-03-04</td>\n",
       "      <td>rodriguez5768</td>\n",
       "      <td>gregory.blackwell</td>\n",
       "      <td>2018-03-17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>kristina.miller</td>\n",
       "      <td>john.hardy</td>\n",
       "      <td>77.05</td>\n",
       "      <td>2018-03-12</td>\n",
       "      <td>john.hardy</td>\n",
       "      <td>kristina.miller</td>\n",
       "      <td>2018-04-25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>lacey8987</td>\n",
       "      <td>mcguire</td>\n",
       "      <td>54.09</td>\n",
       "      <td>2018-03-13</td>\n",
       "      <td>mcguire</td>\n",
       "      <td>lacey8987</td>\n",
       "      <td>2018-06-28</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           from_user        to_user  amount request_date         sender  \\\n",
       "0          chad.chen      paula7980   78.61   2018-02-12      paula7980   \n",
       "1             kallen         lmoore    1.94   2018-02-23         lmoore   \n",
       "2  gregory.blackwell  rodriguez5768   30.57   2018-03-04  rodriguez5768   \n",
       "3    kristina.miller     john.hardy   77.05   2018-03-12     john.hardy   \n",
       "4          lacey8987        mcguire   54.09   2018-03-13        mcguire   \n",
       "\n",
       "            receiver   sent_date  \n",
       "0          chad.chen  2018-07-15  \n",
       "1             kallen  2018-03-05  \n",
       "2  gregory.blackwell  2018-03-17  \n",
       "3    kristina.miller  2018-04-25  \n",
       "4          lacey8987  2018-06-28  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Since we are calling merge on the `requests` DataFrame it is considered the left side\n",
    "successful_requests = requests.merge(\n",
    "    # And transactions is the right side\n",
    "    transactions, \n",
    "    # So now we line up columns that will make the join make sense.\n",
    "    left_on=['from_user', 'to_user', 'amount'], \n",
    "    right_on=['receiver', 'sender', 'amount']\n",
    ")\n",
    "# Let's take a look and see how we did\n",
    "successful_requests.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that since the **`amount`** is the same on both sides of the merge that there is only one column represented."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gather Insights\n",
    "\n",
    "So looking at this data merged together, I'd like to see the time difference between when the request was made, and when the money was actually received.\n",
    "\n",
    "Good news for us, pandas has very powerful date/time functionality, but in order to get there we're going to need to convert our columns. As you can see, the CSV import did not recognize our date field.  **`sent_date`** and **`request_date`** are just plain old objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "from_user        object\n",
       "to_user          object\n",
       "amount          float64\n",
       "request_date     object\n",
       "sender           object\n",
       "receiver         object\n",
       "sent_date        object\n",
       "dtype: object"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "successful_requests.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can convert columns (which remember are a `Series`) on CSV import or just when you need them by using the `pandas.to_datetime` method.  There are actually quite a few conversion methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "from_user               object\n",
       "to_user                 object\n",
       "amount                 float64\n",
       "request_date    datetime64[ns]\n",
       "sender                  object\n",
       "receiver                object\n",
       "sent_date       datetime64[ns]\n",
       "dtype: object"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "successful_requests['request_date'] = pd.to_datetime(successful_requests['request_date'])\n",
    "successful_requests['sent_date'] = pd.to_datetime(successful_requests['sent_date'])\n",
    "# And now we can see they are converted\n",
    "successful_requests.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have dates we can subtract them (vectoization ftw!) to create a timedelta. Let's create a new column called **`time_passed`** that stores the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "successful_requests['time_passed'] = successful_requests.sent_date - successful_requests.request_date"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's take a look at the top 5 longest request to successful transactions by sorting and limiting, to get a vibe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>from_user</th>\n",
       "      <th>to_user</th>\n",
       "      <th>amount</th>\n",
       "      <th>request_date</th>\n",
       "      <th>sender</th>\n",
       "      <th>receiver</th>\n",
       "      <th>sent_date</th>\n",
       "      <th>time_passed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>chad.chen</td>\n",
       "      <td>paula7980</td>\n",
       "      <td>78.61</td>\n",
       "      <td>2018-02-12</td>\n",
       "      <td>paula7980</td>\n",
       "      <td>chad.chen</td>\n",
       "      <td>2018-07-15</td>\n",
       "      <td>153 days</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>sthompson</td>\n",
       "      <td>andrade</td>\n",
       "      <td>14.07</td>\n",
       "      <td>2018-05-09</td>\n",
       "      <td>andrade</td>\n",
       "      <td>sthompson</td>\n",
       "      <td>2018-09-21</td>\n",
       "      <td>135 days</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>lacey8987</td>\n",
       "      <td>mcguire</td>\n",
       "      <td>54.09</td>\n",
       "      <td>2018-03-13</td>\n",
       "      <td>mcguire</td>\n",
       "      <td>lacey8987</td>\n",
       "      <td>2018-06-28</td>\n",
       "      <td>107 days</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>53</th>\n",
       "      <td>marcus.berry</td>\n",
       "      <td>melissa.mendoza</td>\n",
       "      <td>71.48</td>\n",
       "      <td>2018-05-31</td>\n",
       "      <td>melissa.mendoza</td>\n",
       "      <td>marcus.berry</td>\n",
       "      <td>2018-09-06</td>\n",
       "      <td>98 days</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>bishop</td>\n",
       "      <td>massey2102</td>\n",
       "      <td>18.27</td>\n",
       "      <td>2018-05-16</td>\n",
       "      <td>massey2102</td>\n",
       "      <td>bishop</td>\n",
       "      <td>2018-08-15</td>\n",
       "      <td>91 days</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       from_user          to_user  amount request_date           sender  \\\n",
       "0      chad.chen        paula7980   78.61   2018-02-12        paula7980   \n",
       "33     sthompson          andrade   14.07   2018-05-09          andrade   \n",
       "4      lacey8987          mcguire   54.09   2018-03-13          mcguire   \n",
       "53  marcus.berry  melissa.mendoza   71.48   2018-05-31  melissa.mendoza   \n",
       "39        bishop       massey2102   18.27   2018-05-16       massey2102   \n",
       "\n",
       "        receiver  sent_date time_passed  \n",
       "0      chad.chen 2018-07-15    153 days  \n",
       "33     sthompson 2018-09-21    135 days  \n",
       "4      lacey8987 2018-06-28    107 days  \n",
       "53  marcus.berry 2018-09-06     98 days  \n",
       "39        bishop 2018-08-15     91 days  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "successful_requests.sort_values(by='time_passed', ascending=False).head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hmm, that makes wonder how much money passed through the request and transaction system. Let's see."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Wow! $10,496.47 has passed through the request system in 214 transactions!!!'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"Wow! ${:,.2f} has passed through the request system in {} transactions!!!\".format(\n",
    "    successful_requests.amount.sum(),\n",
    "    len(successful_requests),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Further research\n",
    "I saw something a little strange as I was looking through those **`successful_requests`**, it noticed a couple of what seemed like duplicated requests. I called my contact at CashBox and asked about possible duplicate requests.  Sure enough, the application allows you to send multiple requests for the same amount. \n",
    "\n",
    "So this means there are probably duplicates in our **`successful_requests`** `DataFrame` because there are duplicates in the **`requests`**. There is most likely only one transaction that fulfills the request, but there could be multiple requests that match. Our merge brought that duplication across as well.\n",
    "\n",
    "Let's explore the possible duplicates in the **`requests`** `DataFrame`. There is a method [`DataFrame.duplicated`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html) that will return a boolean `Series` which we can use as an index.  A `keep` parameter is available which is used to choose which of the duplicated rows to mark as a duplicate.  You can mark the first, last or all of them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>from_user</th>\n",
       "      <th>to_user</th>\n",
       "      <th>amount</th>\n",
       "      <th>request_date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>58</th>\n",
       "      <td>austin486</td>\n",
       "      <td>shelly</td>\n",
       "      <td>11.24</td>\n",
       "      <td>2018-05-29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>austin486</td>\n",
       "      <td>shelly</td>\n",
       "      <td>11.24</td>\n",
       "      <td>2018-05-29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>cjimenez</td>\n",
       "      <td>sarah.evans</td>\n",
       "      <td>48.14</td>\n",
       "      <td>2018-03-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>cjimenez</td>\n",
       "      <td>sarah.evans</td>\n",
       "      <td>48.14</td>\n",
       "      <td>2018-04-27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>218</th>\n",
       "      <td>clark8139</td>\n",
       "      <td>moore</td>\n",
       "      <td>14.54</td>\n",
       "      <td>2018-08-31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>240</th>\n",
       "      <td>clark8139</td>\n",
       "      <td>moore</td>\n",
       "      <td>14.54</td>\n",
       "      <td>2018-09-10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>195</th>\n",
       "      <td>diane4652</td>\n",
       "      <td>dean2365</td>\n",
       "      <td>6.82</td>\n",
       "      <td>2018-08-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>224</th>\n",
       "      <td>diane4652</td>\n",
       "      <td>dean2365</td>\n",
       "      <td>6.82</td>\n",
       "      <td>2018-09-05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>143</th>\n",
       "      <td>donna1922</td>\n",
       "      <td>danderson</td>\n",
       "      <td>79.22</td>\n",
       "      <td>2018-07-23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>157</th>\n",
       "      <td>donna1922</td>\n",
       "      <td>danderson</td>\n",
       "      <td>79.22</td>\n",
       "      <td>2018-07-31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>116</th>\n",
       "      <td>edwards8658</td>\n",
       "      <td>johnston2281</td>\n",
       "      <td>47.29</td>\n",
       "      <td>2018-07-10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>127</th>\n",
       "      <td>edwards8658</td>\n",
       "      <td>johnston2281</td>\n",
       "      <td>47.29</td>\n",
       "      <td>2018-07-16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>289</th>\n",
       "      <td>heather</td>\n",
       "      <td>smercer</td>\n",
       "      <td>98.13</td>\n",
       "      <td>2018-09-22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>290</th>\n",
       "      <td>heather</td>\n",
       "      <td>smercer</td>\n",
       "      <td>98.13</td>\n",
       "      <td>2018-09-22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>hoover4302</td>\n",
       "      <td>dennis</td>\n",
       "      <td>87.77</td>\n",
       "      <td>2018-05-04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>hoover4302</td>\n",
       "      <td>dennis</td>\n",
       "      <td>87.77</td>\n",
       "      <td>2018-05-05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>160</th>\n",
       "      <td>jcoleman</td>\n",
       "      <td>fuentes</td>\n",
       "      <td>40.48</td>\n",
       "      <td>2018-08-03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>216</th>\n",
       "      <td>jcoleman</td>\n",
       "      <td>fuentes</td>\n",
       "      <td>40.48</td>\n",
       "      <td>2018-08-29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>jones</td>\n",
       "      <td>jones5796</td>\n",
       "      <td>95.88</td>\n",
       "      <td>2018-06-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95</th>\n",
       "      <td>jones</td>\n",
       "      <td>jones5796</td>\n",
       "      <td>95.88</td>\n",
       "      <td>2018-06-27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>kimberly</td>\n",
       "      <td>wbrown</td>\n",
       "      <td>9.02</td>\n",
       "      <td>2018-06-24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>kimberly</td>\n",
       "      <td>wbrown</td>\n",
       "      <td>9.02</td>\n",
       "      <td>2018-06-28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60</th>\n",
       "      <td>kirk</td>\n",
       "      <td>jason</td>\n",
       "      <td>36.53</td>\n",
       "      <td>2018-05-30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>kirk</td>\n",
       "      <td>jason</td>\n",
       "      <td>36.53</td>\n",
       "      <td>2018-06-24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>165</th>\n",
       "      <td>lawrence1685</td>\n",
       "      <td>joshua9690</td>\n",
       "      <td>99.96</td>\n",
       "      <td>2018-08-07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>168</th>\n",
       "      <td>lawrence1685</td>\n",
       "      <td>joshua9690</td>\n",
       "      <td>99.96</td>\n",
       "      <td>2018-08-08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>114</th>\n",
       "      <td>lspencer</td>\n",
       "      <td>sarah</td>\n",
       "      <td>44.89</td>\n",
       "      <td>2018-07-09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>194</th>\n",
       "      <td>lspencer</td>\n",
       "      <td>sarah</td>\n",
       "      <td>44.89</td>\n",
       "      <td>2018-08-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>matthew7940</td>\n",
       "      <td>sherry6113</td>\n",
       "      <td>58.72</td>\n",
       "      <td>2018-05-13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108</th>\n",
       "      <td>matthew7940</td>\n",
       "      <td>sherry6113</td>\n",
       "      <td>58.72</td>\n",
       "      <td>2018-07-04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>201</th>\n",
       "      <td>michael7792</td>\n",
       "      <td>jennifer</td>\n",
       "      <td>90.56</td>\n",
       "      <td>2018-08-23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>241</th>\n",
       "      <td>michael7792</td>\n",
       "      <td>jennifer</td>\n",
       "      <td>90.56</td>\n",
       "      <td>2018-09-10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75</th>\n",
       "      <td>nicholas.travis</td>\n",
       "      <td>corey</td>\n",
       "      <td>73.25</td>\n",
       "      <td>2018-06-15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>109</th>\n",
       "      <td>nicholas.travis</td>\n",
       "      <td>corey</td>\n",
       "      <td>73.25</td>\n",
       "      <td>2018-07-04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>307</th>\n",
       "      <td>patricia</td>\n",
       "      <td>martha6969</td>\n",
       "      <td>87.33</td>\n",
       "      <td>2018-09-25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>312</th>\n",
       "      <td>patricia</td>\n",
       "      <td>martha6969</td>\n",
       "      <td>87.33</td>\n",
       "      <td>2018-09-25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>paula7980</td>\n",
       "      <td>mackenzie</td>\n",
       "      <td>56.00</td>\n",
       "      <td>2018-03-29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>paula7980</td>\n",
       "      <td>mackenzie</td>\n",
       "      <td>56.00</td>\n",
       "      <td>2018-04-15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>68</th>\n",
       "      <td>paula7980</td>\n",
       "      <td>mackenzie</td>\n",
       "      <td>56.00</td>\n",
       "      <td>2018-06-02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>221</th>\n",
       "      <td>phillips5654</td>\n",
       "      <td>miguel.gamble</td>\n",
       "      <td>11.35</td>\n",
       "      <td>2018-09-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>260</th>\n",
       "      <td>phillips5654</td>\n",
       "      <td>miguel.gamble</td>\n",
       "      <td>11.35</td>\n",
       "      <td>2018-09-15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139</th>\n",
       "      <td>rebecca848</td>\n",
       "      <td>mpaul</td>\n",
       "      <td>1.34</td>\n",
       "      <td>2018-07-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>229</th>\n",
       "      <td>rebecca848</td>\n",
       "      <td>mpaul</td>\n",
       "      <td>1.34</td>\n",
       "      <td>2018-09-06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>115</th>\n",
       "      <td>rodriguez9475</td>\n",
       "      <td>miller8552</td>\n",
       "      <td>25.89</td>\n",
       "      <td>2018-07-09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>121</th>\n",
       "      <td>rodriguez9475</td>\n",
       "      <td>miller8552</td>\n",
       "      <td>25.89</td>\n",
       "      <td>2018-07-13</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           from_user        to_user  amount request_date\n",
       "58         austin486         shelly   11.24   2018-05-29\n",
       "59         austin486         shelly   11.24   2018-05-29\n",
       "8           cjimenez    sarah.evans   48.14   2018-03-21\n",
       "26          cjimenez    sarah.evans   48.14   2018-04-27\n",
       "218        clark8139          moore   14.54   2018-08-31\n",
       "240        clark8139          moore   14.54   2018-09-10\n",
       "195        diane4652       dean2365    6.82   2018-08-21\n",
       "224        diane4652       dean2365    6.82   2018-09-05\n",
       "143        donna1922      danderson   79.22   2018-07-23\n",
       "157        donna1922      danderson   79.22   2018-07-31\n",
       "116      edwards8658   johnston2281   47.29   2018-07-10\n",
       "127      edwards8658   johnston2281   47.29   2018-07-16\n",
       "289          heather        smercer   98.13   2018-09-22\n",
       "290          heather        smercer   98.13   2018-09-22\n",
       "32        hoover4302         dennis   87.77   2018-05-04\n",
       "33        hoover4302         dennis   87.77   2018-05-05\n",
       "160         jcoleman        fuentes   40.48   2018-08-03\n",
       "216         jcoleman        fuentes   40.48   2018-08-29\n",
       "82             jones      jones5796   95.88   2018-06-21\n",
       "95             jones      jones5796   95.88   2018-06-27\n",
       "84          kimberly         wbrown    9.02   2018-06-24\n",
       "101         kimberly         wbrown    9.02   2018-06-28\n",
       "60              kirk          jason   36.53   2018-05-30\n",
       "85              kirk          jason   36.53   2018-06-24\n",
       "165     lawrence1685     joshua9690   99.96   2018-08-07\n",
       "168     lawrence1685     joshua9690   99.96   2018-08-08\n",
       "114         lspencer          sarah   44.89   2018-07-09\n",
       "194         lspencer          sarah   44.89   2018-08-21\n",
       "44       matthew7940     sherry6113   58.72   2018-05-13\n",
       "108      matthew7940     sherry6113   58.72   2018-07-04\n",
       "201      michael7792       jennifer   90.56   2018-08-23\n",
       "241      michael7792       jennifer   90.56   2018-09-10\n",
       "75   nicholas.travis          corey   73.25   2018-06-15\n",
       "109  nicholas.travis          corey   73.25   2018-07-04\n",
       "307         patricia     martha6969   87.33   2018-09-25\n",
       "312         patricia     martha6969   87.33   2018-09-25\n",
       "10         paula7980      mackenzie   56.00   2018-03-29\n",
       "15         paula7980      mackenzie   56.00   2018-04-15\n",
       "68         paula7980      mackenzie   56.00   2018-06-02\n",
       "221     phillips5654  miguel.gamble   11.35   2018-09-01\n",
       "260     phillips5654  miguel.gamble   11.35   2018-09-15\n",
       "139       rebecca848          mpaul    1.34   2018-07-21\n",
       "229       rebecca848          mpaul    1.34   2018-09-06\n",
       "115    rodriguez9475     miller8552   25.89   2018-07-09\n",
       "121    rodriguez9475     miller8552   25.89   2018-07-13"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a boolean Series of records that are duplicated. \n",
    "#   Note that `keep=False` marks all that are duplicated\n",
    "dupes = requests[requests.duplicated(('from_user', 'to_user', 'amount'), keep=False)]\n",
    "# Order by requester and the date of request. \n",
    "#   Note that `request_date` in this case is a string, but this string date format sorts properly still.\n",
    "dupes.sort_values(['from_user', 'request_date'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We should get those duplicates out of our successful requests. Let's take a look at some more tools that will help us do cleanup like this one."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}