{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "\t \n", "
\n", " \"Pandas \n", "
\n", "\t\t\n", "
\n", " \n", "
\n", "\t \n", "\t
\n", "\t
\n", "

Group By Practice

\n", "

Let's practice Pandas using the Group By function.

\n", "\t
\t\n", " \t\n", "
\n", "" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# import libraries\n", "import pandas as pd\n", "import sys" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python version 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]\n", "Pandas version 2.2.1\n" ] } ], "source": [ "print('Python version ' + sys.version)\n", "print('Pandas version ' + pd.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The Data\n", "\n", "---\n", "\n", "Here is the csv data if you want to follow along:\n", "```\n", "Date,Symbol,Volume\r\n", "1/1/2013,A,0\r\n", "1/2/2013,A,200\r\n", "1/3/2013,A,1200\r\n", "1/4/2013,A,1001\r\n", "1/5/2013,A,1300\r\n", "1/6/2013,A,1350\r\n", "3/8/2013,B,500\r\n", "3/9/2013,B,1150\r\n", "3/10/2013,B,1180\r\n", "3/11/2013,B,2000\r\n", "1/5/2013,C,56600\r\n", "1/6/2013,C,45000\r\n", "1/7/2013,C,200\r\n", "5/20/2013,E,1300\r\n", "5/21/2013,E,1700\r\n", "5/22/2013,E,900\r\n", "5/23/2013,E,2100\r\n", "5/24/2013,E,8000\r\n", "5/25/2013,E,12000\r\n", "5/26/2013,E,1900\r\n", "5/27/2013,E,1000\r\n", "5/28/2013,E,1900\r\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Read CSV file \n", "\n", "For those who are following along, note that the code below will most likely return an error for you (the file Test_9_17_Python.csv is not on your computer). \n", "\n", "Here are two options for you: \n", "- Use the ```raw = pd.read_clipboard(sep=',')``` method\n", "- Manually create the csv and save it in the same location as this notebook" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see we are working with daily trading volume for various stocks over a period of time, from January 1, 2013, to May 28, 2013. \n", "\n", "Here are the stock Symbols: A, B, C, and E\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateSymbolVolume
01/1/2013A0
11/2/2013A200
21/3/2013A1200
31/4/2013A1001
41/5/2013A1300
\n", "
" ], "text/plain": [ " Date Symbol Volume\n", "0 1/1/2013 A 0\n", "1 1/2/2013 A 200\n", "2 1/3/2013 A 1200\n", "3 1/4/2013 A 1001\n", "4 1/5/2013 A 1300" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw = pd.read_csv('Test_9_17_Python.csv')\n", "raw.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At times I like to make a copy of the original dataframe just in case I have to go back to the original data. Yes, we can just recreate the data, but at times I rather have it handy. \n", "\n", "> The copy() function allows you to create a truly independent copy of the data, so changes to the new object won't affect the original" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "df = raw.copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fix Date Column \n", "\n", "Whenever you import data from basically anywhere, please please check your data types. You want to make sure the data in the correct format before you begin to work with it. \n", "\n", "As you can see below, the Date column, which represents our dates, came in as a string. We need to fix this." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 22 entries, 0 to 21\n", "Data columns (total 3 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Date 22 non-null object\n", " 1 Symbol 22 non-null object\n", " 2 Volume 22 non-null int64 \n", "dtypes: int64(1), object(2)\n", "memory usage: 660.0+ bytes\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Pandas to the rescue! \n", "\n", "We can automagically convert our dates that are currently represented as strings and converted to datetime objects. Note that it's not always this easy and expect to battle date strings in the future." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "df['Date'] = pd.to_datetime(df['Date'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have all our columns looking good.\n", "\n", "- Our dates are datetime objects\n", "- The symbols are strings\n", "- The trading volume are integers" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "Date datetime64[ns]\n", "Symbol object\n", "Volume int64\n", "dtype: object" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Add a New Column \n", "\n", "Let's randomly assign a buy or sell flag to each row in this dataset. This will give us more data to work with and show off some of Pandas GroupBy capabilities." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "We created 22 strings\n" ] }, { "data": { "text/plain": [ "['buy',\n", " 'sell',\n", " 'buy',\n", " 'sell',\n", " 'buy',\n", " 'sell',\n", " 'buy',\n", " 'sell',\n", " 'buy',\n", " 'sell',\n", " 'buy',\n", " 'sell',\n", " 'buy',\n", " 'sell',\n", " 'buy',\n", " 'sell',\n", " 'buy',\n", " 'sell',\n", " 'buy',\n", " 'sell',\n", " 'buy',\n", " 'sell']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# here are our buy and sell options\n", "pool = ['buy','sell']\n", "\n", "# create our data in the correct length\n", "# Since we have 2 items in the dataset, I just multiplied half of the dataset by the list. Can you think of a better solution?\n", "pool = pool*int(len(df)/2)\n", "\n", "# how many strings did we just create?\n", "print('We created ' + str(len(pool)) + ' strings')\n", "\n", "pool" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateSymbolVolumeflag
02013-01-01A0buy
12013-01-02A200sell
22013-01-03A1200buy
32013-01-04A1001sell
42013-01-05A1300buy
\n", "
" ], "text/plain": [ " Date Symbol Volume flag\n", "0 2013-01-01 A 0 buy\n", "1 2013-01-02 A 200 sell\n", "2 2013-01-03 A 1200 buy\n", "3 2013-01-04 A 1001 sell\n", "4 2013-01-05 A 1300 buy" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# add the new column\n", "df['flag'] = pool\n", "\n", "# print top 5 rows\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Group by one column \n", "\n", "At a high level, the **GroupBy** method will:\n", "- Group your data by one or more columns\n", "- And then perform some action on each group (like calculate the sum, mean, etc.)\n", "\n", "In the example below, we are grouping by the column named **Symbol** and then adding all the values found in the **Volume** column for that group. \n", "\n", "**As a side note:**\n", "If my dataframe has non numeric columns, in this case it's the **flag** column, I have been adding the ```numeric_only=True``` code to avoid the Pandas warning message. It's kind of annoying." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Volume
Symbol
A5051
B4830
C101800
E30800
\n", "
" ], "text/plain": [ " Volume\n", "Symbol \n", "A 5051\n", "B 4830\n", "C 101800\n", "E 30800" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create group object\n", "group = df.groupby('Symbol')\n", "\n", "# perform a sum function on the group\n", "group.sum(numeric_only=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Let's do a Deep Dive\n", "\n", "When you use the groupby function in Pandas, you are grouping your data by one or more columns. These columns are called the \"grouping columns\" or \"index columns\". \n", "\n", "In our example above we grouped by the column **Symbol**: \n", "> ```group = df.groupby('Symbol')``` \n", "\n", "- The \"Symbol\" column becomes the grouping column \n", "\n", "When you apply a function to this grouped DataFrame using **apply**, the function is currently applied to both the grouping columns and the data columns. \n", "- in this case, the grouping column is \"Symbol\" \n", "- in this case, the data column is \"Volume\" \n", "\n", "However, in future versions of Pandas, the grouping columns will be excluded from the operation by default. This means that the function will only be applied to the data columns (\"Volume\" in this example), and not to the grouping column named \"Symbol\". \n", "\n", "This change is being made to prevent unexpected behavior and to make the apply function more predictable. If you want to include the grouping columns in the operation, you will need to explicitly select them or pass include_groups=True." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Volume
Symbol
A5051
B4830
C101800
E30800
\n", "
" ], "text/plain": [ " Volume\n", "Symbol \n", "A 5051\n", "B 4830\n", "C 101800\n", "E 30800" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def suma(group):\n", " ''' perform a sum on the group '''\n", " return group.sum(numeric_only=True)\n", "\n", "group.apply(suma, include_groups=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's mess around with the **transform** method. Unlike the previous example, transform allows us to keep the original shape of our dataframe.\n", "\n", "To avoid any issues with the non-numeric flag column, I excluded it while creating the group object. Yes, I cheated a bit here." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateSymbolVolumeflagaddition
02013-01-01A0buy5051
12013-01-02A200sell5051
22013-01-03A1200buy5051
32013-01-04A1001sell5051
42013-01-05A1300buy5051
52013-01-06A1350sell5051
62013-03-08B500buy4830
72013-03-09B1150sell4830
82013-03-10B1180buy4830
92013-03-11B2000sell4830
\n", "
" ], "text/plain": [ " Date Symbol Volume flag addition\n", "0 2013-01-01 A 0 buy 5051\n", "1 2013-01-02 A 200 sell 5051\n", "2 2013-01-03 A 1200 buy 5051\n", "3 2013-01-04 A 1001 sell 5051\n", "4 2013-01-05 A 1300 buy 5051\n", "5 2013-01-06 A 1350 sell 5051\n", "6 2013-03-08 B 500 buy 4830\n", "7 2013-03-09 B 1150 sell 4830\n", "8 2013-03-10 B 1180 buy 4830\n", "9 2013-03-11 B 2000 sell 4830" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create group object\n", "group = df[['Symbol','Volume']].groupby('Symbol')\n", "\n", "def suma(group):\n", " ''' perform a sum on the group '''\n", " return group.sum(numeric_only=True)\n", "\n", "# create a new column\n", "df['addition'] = group.transform(suma)['Volume']\n", "df.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we get fancy!\n", "\n", "The function below named **test** will do the following: \n", "- filter rows where **flag** is equal to \"sell\"\n", "- filter rows where **Volume** is greater than 1000\n", "\n", "So we only return the rows that meet the two conditions above." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateVolumeflagaddition
Symbol
A32013-01-041001sell5051
52013-01-061350sell5051
B72013-03-091150sell4830
92013-03-112000sell4830
C112013-01-0645000sell101800
E132013-05-201300sell30800
172013-05-248000sell30800
192013-05-261900sell30800
212013-05-281900sell30800
\n", "
" ], "text/plain": [ " Date Volume flag addition\n", "Symbol \n", "A 3 2013-01-04 1001 sell 5051\n", " 5 2013-01-06 1350 sell 5051\n", "B 7 2013-03-09 1150 sell 4830\n", " 9 2013-03-11 2000 sell 4830\n", "C 11 2013-01-06 45000 sell 101800\n", "E 13 2013-05-20 1300 sell 30800\n", " 17 2013-05-24 8000 sell 30800\n", " 19 2013-05-26 1900 sell 30800\n", " 21 2013-05-28 1900 sell 30800" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "group = df.groupby('Symbol')\n", "\n", "def test(group):\n", " mask1 = group.apply(lambda x: x.iloc[2]=='sell' and x.iloc[1]>1000, axis=1)\n", " return group[mask1]\n", "\n", "group.apply(test, include_groups=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Group by two columns\n", "\n", "The examples below show you how to group by multiple columns. \n", "\n", "We are not going to go through new material, we are just grouping by 2 columns. You got this!" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Volumeaddition
Symbolflag
Abuy250015153
sell255115153
Bbuy16809660
sell31509660
Cbuy56800203600
sell45000101800
Ebuy16800123200
sell14000154000
\n", "
" ], "text/plain": [ " Volume addition\n", "Symbol flag \n", "A buy 2500 15153\n", " sell 2551 15153\n", "B buy 1680 9660\n", " sell 3150 9660\n", "C buy 56800 203600\n", " sell 45000 101800\n", "E buy 16800 123200\n", " sell 14000 154000" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create group object, remember to pass a list when grouping by multiple columns\n", "group = df.groupby(['Symbol', 'flag'])\n", "\n", "# perform a sum function on the group\n", "group.sum(numeric_only=True)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "Symbol flag\n", "A buy 2500\n", " sell 2551\n", "B buy 1680\n", " sell 3150\n", "C buy 56800\n", " sell 45000\n", "E buy 16800\n", " sell 14000\n", "dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# apply a function to the multi column group\n", "def suma(group):\n", " return group['Volume'].sum(numeric_only=True)\n", "\n", "group.apply(suma, include_groups=False)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateSymbolVolumeflagaddition
02013-01-01A0buy2500
12013-01-02A200sell2551
22013-01-03A1200buy2500
32013-01-04A1001sell2551
42013-01-05A1300buy2500
\n", "
" ], "text/plain": [ " Date Symbol Volume flag addition\n", "0 2013-01-01 A 0 buy 2500\n", "1 2013-01-02 A 200 sell 2551\n", "2 2013-01-03 A 1200 buy 2500\n", "3 2013-01-04 A 1001 sell 2551\n", "4 2013-01-05 A 1300 buy 2500" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "group = df[['Symbol','Volume','flag']].groupby(['Symbol', 'flag'])\n", "\n", "def suma(group):\n", " return group.sum(numeric_only=True)\n", "\n", "df['addition'] = group.transform(suma)['Volume']\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DateVolumeaddition
Symbolflag
Abuy22013-01-0312002500
42013-01-0513002500
sell32013-01-0410012551
52013-01-0613502551
Bbuy82013-03-1011801680
sell72013-03-0911503150
92013-03-1120003150
Cbuy102013-01-055660056800
sell112013-01-064500045000
Ebuy142013-05-21170016800
162013-05-23210016800
182013-05-251200016800
sell132013-05-20130014000
172013-05-24800014000
192013-05-26190014000
212013-05-28190014000
\n", "
" ], "text/plain": [ " Date Volume addition\n", "Symbol flag \n", "A buy 2 2013-01-03 1200 2500\n", " 4 2013-01-05 1300 2500\n", " sell 3 2013-01-04 1001 2551\n", " 5 2013-01-06 1350 2551\n", "B buy 8 2013-03-10 1180 1680\n", " sell 7 2013-03-09 1150 3150\n", " 9 2013-03-11 2000 3150\n", "C buy 10 2013-01-05 56600 56800\n", " sell 11 2013-01-06 45000 45000\n", "E buy 14 2013-05-21 1700 16800\n", " 16 2013-05-23 2100 16800\n", " 18 2013-05-25 12000 16800\n", " sell 13 2013-05-20 1300 14000\n", " 17 2013-05-24 8000 14000\n", " 19 2013-05-26 1900 14000\n", " 21 2013-05-28 1900 14000" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "group = df.groupby(['Symbol', 'flag'])\n", "\n", "def test(group):\n", " mask1 = group.apply(lambda x: x.iloc[1]>1000, axis=1)\n", " return group[mask1]\n", "\n", "group.apply(test)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "

This tutorial was created by HEDARO

" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 4 }