{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- ---------------------------------------------------- -->\n",
    "  <div class=\"col-sm-3 col-md-3 col-lg-3\">\n",
    "\t<!-- logo -->  \n",
    "    <div class=\"img-responsive\">\n",
    "      <img src=\"https://www.dropbox.com/s/220ncn0o5danuey/pandas-ipython-tutorials-hedaro.jpg?dl=1\" title=\"Pandas Tutorial | Hedaro\" alt=\"Pandas Tutorial | Hedaro\">    \n",
    "    </div>\n",
    "\t<!-- logo -->\t\n",
    "  </div>\n",
    "<!-- ---------------------------------------------------- --> \n",
    "  <div class=\"col-sm-6 col-md-6 col-lg-6\">\n",
    "\t<!-- Pandas Tutorial -->  \n",
    "\t  <center>\n",
    "\t    <br>\n",
    "        <h1>Group By Practice</h1>\n",
    "        <p>Let's practice Pandas using the <strong>Group By</strong> function.</p>\n",
    "\t  </center>\t\n",
    "    <!-- Pandas Tutorial -->\t\n",
    "  </div>\n",
    "<!-- ---------------------------------------------------- -->"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# import libraries\n",
    "import pandas as pd\n",
    "import sys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python version 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]\n",
      "Pandas version 2.2.1\n"
     ]
    }
   ],
   "source": [
    "print('Python version ' + sys.version)\n",
    "print('Pandas version ' + pd.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Data\n",
    "\n",
    "---\n",
    "\n",
    "Here is the csv data if you want to follow along:\n",
    "```\n",
    "Date,Symbol,Volume\r\n",
    "1/1/2013,A,0\r\n",
    "1/2/2013,A,200\r\n",
    "1/3/2013,A,1200\r\n",
    "1/4/2013,A,1001\r\n",
    "1/5/2013,A,1300\r\n",
    "1/6/2013,A,1350\r\n",
    "3/8/2013,B,500\r\n",
    "3/9/2013,B,1150\r\n",
    "3/10/2013,B,1180\r\n",
    "3/11/2013,B,2000\r\n",
    "1/5/2013,C,56600\r\n",
    "1/6/2013,C,45000\r\n",
    "1/7/2013,C,200\r\n",
    "5/20/2013,E,1300\r\n",
    "5/21/2013,E,1700\r\n",
    "5/22/2013,E,900\r\n",
    "5/23/2013,E,2100\r\n",
    "5/24/2013,E,8000\r\n",
    "5/25/2013,E,12000\r\n",
    "5/26/2013,E,1900\r\n",
    "5/27/2013,E,1000\r\n",
    "5/28/2013,E,1900\r\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Read CSV file  \n",
    "\n",
    "For those who are following along, note that the code below will most likely return an error for you (the file Test_9_17_Python.csv is not on your computer).  \n",
    "\n",
    "Here are two options for you:  \n",
    "- Use the ```raw = pd.read_clipboard(sep=',')``` method\n",
    "- Manually create the csv and save it in the same location as this notebook"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see we are working with daily trading volume for various stocks over a period of time, from January 1, 2013, to May 28, 2013. \n",
    "\n",
    "Here are the stock Symbols: A, B, C, and E\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Symbol</th>\n",
       "      <th>Volume</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1/1/2013</td>\n",
       "      <td>A</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1/2/2013</td>\n",
       "      <td>A</td>\n",
       "      <td>200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1/3/2013</td>\n",
       "      <td>A</td>\n",
       "      <td>1200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1/4/2013</td>\n",
       "      <td>A</td>\n",
       "      <td>1001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1/5/2013</td>\n",
       "      <td>A</td>\n",
       "      <td>1300</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Date Symbol  Volume\n",
       "0  1/1/2013      A       0\n",
       "1  1/2/2013      A     200\n",
       "2  1/3/2013      A    1200\n",
       "3  1/4/2013      A    1001\n",
       "4  1/5/2013      A    1300"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "raw = pd.read_csv('Test_9_17_Python.csv')\n",
    "raw.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At times I like to make a copy of the original dataframe just in case I have to go back to the original data. Yes, we can just recreate the data, but at times I rather have it handy.  \n",
    "\n",
    "> The copy() function allows you to create a truly independent copy of the data, so changes to the new object won't affect the original"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "df = raw.copy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fix Date Column  \n",
    "\n",
    "Whenever you import data from basically anywhere, please please check your data types. You want to make sure the data in the correct format before you begin to work with it. \n",
    "\n",
    "As you can see below, the Date column, which represents our dates, came in as a string. We need to fix this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 22 entries, 0 to 21\n",
      "Data columns (total 3 columns):\n",
      " #   Column  Non-Null Count  Dtype \n",
      "---  ------  --------------  ----- \n",
      " 0   Date    22 non-null     object\n",
      " 1   Symbol  22 non-null     object\n",
      " 2   Volume  22 non-null     int64 \n",
      "dtypes: int64(1), object(2)\n",
      "memory usage: 660.0+ bytes\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Pandas to the rescue!  \n",
    "\n",
    "We can automagically convert our dates that are currently represented as strings and converted to datetime objects. Note that it's not always this easy and expect to battle date strings in the future."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "df['Date'] = pd.to_datetime(df['Date'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now have all our columns looking good.\n",
    "\n",
    "- Our dates are datetime objects\n",
    "- The symbols are strings\n",
    "- The trading volume are integers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Date      datetime64[ns]\n",
       "Symbol            object\n",
       "Volume             int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Add a New Column  \n",
    "\n",
    "Let's randomly assign a buy or sell flag to each row in this dataset. This will give us more data to work with and show off some of Pandas GroupBy capabilities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "We created 22 strings\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['buy',\n",
       " 'sell',\n",
       " 'buy',\n",
       " 'sell',\n",
       " 'buy',\n",
       " 'sell',\n",
       " 'buy',\n",
       " 'sell',\n",
       " 'buy',\n",
       " 'sell',\n",
       " 'buy',\n",
       " 'sell',\n",
       " 'buy',\n",
       " 'sell',\n",
       " 'buy',\n",
       " 'sell',\n",
       " 'buy',\n",
       " 'sell',\n",
       " 'buy',\n",
       " 'sell',\n",
       " 'buy',\n",
       " 'sell']"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# here are our buy and sell options\n",
    "pool = ['buy','sell']\n",
    "\n",
    "# create our data in the correct length\n",
    "# Since we have 2 items in the dataset, I just multiplied half of the dataset by the list. Can you think of a better solution?\n",
    "pool = pool*int(len(df)/2)\n",
    "\n",
    "# how many strings did we just create?\n",
    "print('We created ' + str(len(pool)) + ' strings')\n",
    "\n",
    "pool"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Symbol</th>\n",
       "      <th>Volume</th>\n",
       "      <th>flag</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2013-01-01</td>\n",
       "      <td>A</td>\n",
       "      <td>0</td>\n",
       "      <td>buy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2013-01-02</td>\n",
       "      <td>A</td>\n",
       "      <td>200</td>\n",
       "      <td>sell</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2013-01-03</td>\n",
       "      <td>A</td>\n",
       "      <td>1200</td>\n",
       "      <td>buy</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2013-01-04</td>\n",
       "      <td>A</td>\n",
       "      <td>1001</td>\n",
       "      <td>sell</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2013-01-05</td>\n",
       "      <td>A</td>\n",
       "      <td>1300</td>\n",
       "      <td>buy</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Date Symbol  Volume  flag\n",
       "0 2013-01-01      A       0   buy\n",
       "1 2013-01-02      A     200  sell\n",
       "2 2013-01-03      A    1200   buy\n",
       "3 2013-01-04      A    1001  sell\n",
       "4 2013-01-05      A    1300   buy"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# add the new column\n",
    "df['flag'] = pool\n",
    "\n",
    "# print top 5 rows\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Group by one column  \n",
    "\n",
    "At a high level, the **GroupBy** method will:\n",
    "- Group your data by one or more columns\n",
    "- And then perform some action on each group (like calculate the sum, mean, etc.)\n",
    "\n",
    "In the example below, we are grouping by the column named **Symbol** and then adding all the values found in the **Volume** column for that group.  \n",
    "\n",
    "**As a side note:**\n",
    "If my dataframe has non numeric columns, in this case it's the **flag** column, I have been adding the ```numeric_only=True``` code to avoid the Pandas warning message. It's kind of annoying."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Volume</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Symbol</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A</th>\n",
       "      <td>5051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td>4830</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td>101800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>E</th>\n",
       "      <td>30800</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Volume\n",
       "Symbol        \n",
       "A         5051\n",
       "B         4830\n",
       "C       101800\n",
       "E        30800"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create group object\n",
    "group = df.groupby('Symbol')\n",
    "\n",
    "# perform a sum function on the group\n",
    "group.sum(numeric_only=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's do a Deep Dive\n",
    "\n",
    "When you use the groupby function in Pandas, you are grouping your data by one or more columns. These columns are called the \"grouping columns\" or \"index columns\".  \n",
    "\n",
    "In our example above we grouped by the column **Symbol**:  \n",
    "> ```group = df.groupby('Symbol')```  \n",
    "\n",
    "- The \"Symbol\" column becomes the grouping column  \n",
    "\n",
    "When you apply a function to this grouped DataFrame using **apply**, the function is currently applied to both the grouping columns and the data columns.  \n",
    "- in this case, the grouping column is \"Symbol\"  \n",
    "- in this case, the data column is \"Volume\"  \n",
    "\n",
    "However, in future versions of Pandas, the grouping columns will be excluded from the operation by default. This means that the function will only be applied to the data columns (\"Volume\" in this example), and not to the grouping column named \"Symbol\".  \n",
    "\n",
    "This change is being made to prevent unexpected behavior and to make the apply function more predictable. If you want to include the grouping columns in the operation, you will need to explicitly select them or pass include_groups=True."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Volume</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Symbol</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A</th>\n",
       "      <td>5051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td>4830</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td>101800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>E</th>\n",
       "      <td>30800</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Volume\n",
       "Symbol        \n",
       "A         5051\n",
       "B         4830\n",
       "C       101800\n",
       "E        30800"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def suma(group):\n",
    "    ''' perform a sum on the group '''\n",
    "    return group.sum(numeric_only=True)\n",
    "\n",
    "group.apply(suma, include_groups=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's mess around with the **transform** method. Unlike the previous example, transform allows us to keep the original shape of our dataframe.\n",
    "\n",
    "To avoid any issues with the non-numeric flag column, I excluded it while creating the group object. Yes, I cheated a bit here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Symbol</th>\n",
       "      <th>Volume</th>\n",
       "      <th>flag</th>\n",
       "      <th>addition</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2013-01-01</td>\n",
       "      <td>A</td>\n",
       "      <td>0</td>\n",
       "      <td>buy</td>\n",
       "      <td>5051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2013-01-02</td>\n",
       "      <td>A</td>\n",
       "      <td>200</td>\n",
       "      <td>sell</td>\n",
       "      <td>5051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2013-01-03</td>\n",
       "      <td>A</td>\n",
       "      <td>1200</td>\n",
       "      <td>buy</td>\n",
       "      <td>5051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2013-01-04</td>\n",
       "      <td>A</td>\n",
       "      <td>1001</td>\n",
       "      <td>sell</td>\n",
       "      <td>5051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2013-01-05</td>\n",
       "      <td>A</td>\n",
       "      <td>1300</td>\n",
       "      <td>buy</td>\n",
       "      <td>5051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2013-01-06</td>\n",
       "      <td>A</td>\n",
       "      <td>1350</td>\n",
       "      <td>sell</td>\n",
       "      <td>5051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2013-03-08</td>\n",
       "      <td>B</td>\n",
       "      <td>500</td>\n",
       "      <td>buy</td>\n",
       "      <td>4830</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2013-03-09</td>\n",
       "      <td>B</td>\n",
       "      <td>1150</td>\n",
       "      <td>sell</td>\n",
       "      <td>4830</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2013-03-10</td>\n",
       "      <td>B</td>\n",
       "      <td>1180</td>\n",
       "      <td>buy</td>\n",
       "      <td>4830</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2013-03-11</td>\n",
       "      <td>B</td>\n",
       "      <td>2000</td>\n",
       "      <td>sell</td>\n",
       "      <td>4830</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Date Symbol  Volume  flag  addition\n",
       "0 2013-01-01      A       0   buy      5051\n",
       "1 2013-01-02      A     200  sell      5051\n",
       "2 2013-01-03      A    1200   buy      5051\n",
       "3 2013-01-04      A    1001  sell      5051\n",
       "4 2013-01-05      A    1300   buy      5051\n",
       "5 2013-01-06      A    1350  sell      5051\n",
       "6 2013-03-08      B     500   buy      4830\n",
       "7 2013-03-09      B    1150  sell      4830\n",
       "8 2013-03-10      B    1180   buy      4830\n",
       "9 2013-03-11      B    2000  sell      4830"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create group object\n",
    "group = df[['Symbol','Volume']].groupby('Symbol')\n",
    "\n",
    "def suma(group):\n",
    "    ''' perform a sum on the group '''\n",
    "    return group.sum(numeric_only=True)\n",
    "\n",
    "# create a new column\n",
    "df['addition'] = group.transform(suma)['Volume']\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we get fancy!\n",
    "\n",
    "The function below named **test** will do the following:  \n",
    "- filter rows where **flag** is equal to \"sell\"\n",
    "- filter rows where **Volume** is greater than 1000\n",
    "\n",
    "So we only return the rows that meet the two conditions above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Volume</th>\n",
       "      <th>flag</th>\n",
       "      <th>addition</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Symbol</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">A</th>\n",
       "      <th>3</th>\n",
       "      <td>2013-01-04</td>\n",
       "      <td>1001</td>\n",
       "      <td>sell</td>\n",
       "      <td>5051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2013-01-06</td>\n",
       "      <td>1350</td>\n",
       "      <td>sell</td>\n",
       "      <td>5051</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">B</th>\n",
       "      <th>7</th>\n",
       "      <td>2013-03-09</td>\n",
       "      <td>1150</td>\n",
       "      <td>sell</td>\n",
       "      <td>4830</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2013-03-11</td>\n",
       "      <td>2000</td>\n",
       "      <td>sell</td>\n",
       "      <td>4830</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <th>11</th>\n",
       "      <td>2013-01-06</td>\n",
       "      <td>45000</td>\n",
       "      <td>sell</td>\n",
       "      <td>101800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">E</th>\n",
       "      <th>13</th>\n",
       "      <td>2013-05-20</td>\n",
       "      <td>1300</td>\n",
       "      <td>sell</td>\n",
       "      <td>30800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>2013-05-24</td>\n",
       "      <td>8000</td>\n",
       "      <td>sell</td>\n",
       "      <td>30800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>2013-05-26</td>\n",
       "      <td>1900</td>\n",
       "      <td>sell</td>\n",
       "      <td>30800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>2013-05-28</td>\n",
       "      <td>1900</td>\n",
       "      <td>sell</td>\n",
       "      <td>30800</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                Date  Volume  flag  addition\n",
       "Symbol                                      \n",
       "A      3  2013-01-04    1001  sell      5051\n",
       "       5  2013-01-06    1350  sell      5051\n",
       "B      7  2013-03-09    1150  sell      4830\n",
       "       9  2013-03-11    2000  sell      4830\n",
       "C      11 2013-01-06   45000  sell    101800\n",
       "E      13 2013-05-20    1300  sell     30800\n",
       "       17 2013-05-24    8000  sell     30800\n",
       "       19 2013-05-26    1900  sell     30800\n",
       "       21 2013-05-28    1900  sell     30800"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group = df.groupby('Symbol')\n",
    "\n",
    "def test(group):\n",
    "    mask1 = group.apply(lambda x: x.iloc[2]=='sell' and x.iloc[1]>1000, axis=1)\n",
    "    return group[mask1]\n",
    "\n",
    "group.apply(test, include_groups=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Group by two columns\n",
    "\n",
    "The examples below show you how to group by multiple columns. \n",
    "\n",
    "We are not going to go through new material, we are just grouping by 2 columns. You got this!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Volume</th>\n",
       "      <th>addition</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Symbol</th>\n",
       "      <th>flag</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">A</th>\n",
       "      <th>buy</th>\n",
       "      <td>2500</td>\n",
       "      <td>15153</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sell</th>\n",
       "      <td>2551</td>\n",
       "      <td>15153</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">B</th>\n",
       "      <th>buy</th>\n",
       "      <td>1680</td>\n",
       "      <td>9660</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sell</th>\n",
       "      <td>3150</td>\n",
       "      <td>9660</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">C</th>\n",
       "      <th>buy</th>\n",
       "      <td>56800</td>\n",
       "      <td>203600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sell</th>\n",
       "      <td>45000</td>\n",
       "      <td>101800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">E</th>\n",
       "      <th>buy</th>\n",
       "      <td>16800</td>\n",
       "      <td>123200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sell</th>\n",
       "      <td>14000</td>\n",
       "      <td>154000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Volume  addition\n",
       "Symbol flag                  \n",
       "A      buy     2500     15153\n",
       "       sell    2551     15153\n",
       "B      buy     1680      9660\n",
       "       sell    3150      9660\n",
       "C      buy    56800    203600\n",
       "       sell   45000    101800\n",
       "E      buy    16800    123200\n",
       "       sell   14000    154000"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# create group object, remember to pass a list when grouping by multiple columns\n",
    "group = df.groupby(['Symbol', 'flag'])\n",
    "\n",
    "# perform a sum function on the group\n",
    "group.sum(numeric_only=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Symbol  flag\n",
       "A       buy      2500\n",
       "        sell     2551\n",
       "B       buy      1680\n",
       "        sell     3150\n",
       "C       buy     56800\n",
       "        sell    45000\n",
       "E       buy     16800\n",
       "        sell    14000\n",
       "dtype: int64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# apply a function to the multi column group\n",
    "def suma(group):\n",
    "    return group['Volume'].sum(numeric_only=True)\n",
    "\n",
    "group.apply(suma, include_groups=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Symbol</th>\n",
       "      <th>Volume</th>\n",
       "      <th>flag</th>\n",
       "      <th>addition</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2013-01-01</td>\n",
       "      <td>A</td>\n",
       "      <td>0</td>\n",
       "      <td>buy</td>\n",
       "      <td>2500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2013-01-02</td>\n",
       "      <td>A</td>\n",
       "      <td>200</td>\n",
       "      <td>sell</td>\n",
       "      <td>2551</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2013-01-03</td>\n",
       "      <td>A</td>\n",
       "      <td>1200</td>\n",
       "      <td>buy</td>\n",
       "      <td>2500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2013-01-04</td>\n",
       "      <td>A</td>\n",
       "      <td>1001</td>\n",
       "      <td>sell</td>\n",
       "      <td>2551</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2013-01-05</td>\n",
       "      <td>A</td>\n",
       "      <td>1300</td>\n",
       "      <td>buy</td>\n",
       "      <td>2500</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Date Symbol  Volume  flag  addition\n",
       "0 2013-01-01      A       0   buy      2500\n",
       "1 2013-01-02      A     200  sell      2551\n",
       "2 2013-01-03      A    1200   buy      2500\n",
       "3 2013-01-04      A    1001  sell      2551\n",
       "4 2013-01-05      A    1300   buy      2500"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group = df[['Symbol','Volume','flag']].groupby(['Symbol', 'flag'])\n",
    "\n",
    "def suma(group):\n",
    "    return group.sum(numeric_only=True)\n",
    "\n",
    "df['addition'] = group.transform(suma)['Volume']\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Volume</th>\n",
       "      <th>addition</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Symbol</th>\n",
       "      <th>flag</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">A</th>\n",
       "      <th rowspan=\"2\" valign=\"top\">buy</th>\n",
       "      <th>2</th>\n",
       "      <td>2013-01-03</td>\n",
       "      <td>1200</td>\n",
       "      <td>2500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2013-01-05</td>\n",
       "      <td>1300</td>\n",
       "      <td>2500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">sell</th>\n",
       "      <th>3</th>\n",
       "      <td>2013-01-04</td>\n",
       "      <td>1001</td>\n",
       "      <td>2551</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2013-01-06</td>\n",
       "      <td>1350</td>\n",
       "      <td>2551</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">B</th>\n",
       "      <th>buy</th>\n",
       "      <th>8</th>\n",
       "      <td>2013-03-10</td>\n",
       "      <td>1180</td>\n",
       "      <td>1680</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">sell</th>\n",
       "      <th>7</th>\n",
       "      <td>2013-03-09</td>\n",
       "      <td>1150</td>\n",
       "      <td>3150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2013-03-11</td>\n",
       "      <td>2000</td>\n",
       "      <td>3150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">C</th>\n",
       "      <th>buy</th>\n",
       "      <th>10</th>\n",
       "      <td>2013-01-05</td>\n",
       "      <td>56600</td>\n",
       "      <td>56800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>sell</th>\n",
       "      <th>11</th>\n",
       "      <td>2013-01-06</td>\n",
       "      <td>45000</td>\n",
       "      <td>45000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"7\" valign=\"top\">E</th>\n",
       "      <th rowspan=\"3\" valign=\"top\">buy</th>\n",
       "      <th>14</th>\n",
       "      <td>2013-05-21</td>\n",
       "      <td>1700</td>\n",
       "      <td>16800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>2013-05-23</td>\n",
       "      <td>2100</td>\n",
       "      <td>16800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>2013-05-25</td>\n",
       "      <td>12000</td>\n",
       "      <td>16800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">sell</th>\n",
       "      <th>13</th>\n",
       "      <td>2013-05-20</td>\n",
       "      <td>1300</td>\n",
       "      <td>14000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>2013-05-24</td>\n",
       "      <td>8000</td>\n",
       "      <td>14000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>2013-05-26</td>\n",
       "      <td>1900</td>\n",
       "      <td>14000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>2013-05-28</td>\n",
       "      <td>1900</td>\n",
       "      <td>14000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date  Volume  addition\n",
       "Symbol flag                                \n",
       "A      buy  2  2013-01-03    1200      2500\n",
       "            4  2013-01-05    1300      2500\n",
       "       sell 3  2013-01-04    1001      2551\n",
       "            5  2013-01-06    1350      2551\n",
       "B      buy  8  2013-03-10    1180      1680\n",
       "       sell 7  2013-03-09    1150      3150\n",
       "            9  2013-03-11    2000      3150\n",
       "C      buy  10 2013-01-05   56600     56800\n",
       "       sell 11 2013-01-06   45000     45000\n",
       "E      buy  14 2013-05-21    1700     16800\n",
       "            16 2013-05-23    2100     16800\n",
       "            18 2013-05-25   12000     16800\n",
       "       sell 13 2013-05-20    1300     14000\n",
       "            17 2013-05-24    8000     14000\n",
       "            19 2013-05-26    1900     14000\n",
       "            21 2013-05-28    1900     14000"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "group = df.groupby(['Symbol', 'flag'])\n",
    "\n",
    "def test(group):\n",
    "    mask1 = group.apply(lambda x: x.iloc[1]>1000, axis=1)\n",
    "    return group[mask1]\n",
    "\n",
    "group.apply(test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "<p class=\"text-muted\">This tutorial was created by <a href=\"http://www.hedaro.com\" target=\"_blank\"><strong>HEDARO</strong></a></p>"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}