{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"
\n",
"\t \n",
"
\n",
"
\n",
"
\n",
"\t\t\n",
"
\n",
" \n",
" \n",
"\t \n",
"\t
\n",
"\t
\n",
" Group By Practice
\n",
" Let's practice Pandas using the Group By function.
\n",
"\t \t\n",
" \t\n",
"
\n",
""
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"# import libraries\n",
"import pandas as pd\n",
"import sys"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Python version 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]\n",
"Pandas version 2.2.1\n"
]
}
],
"source": [
"print('Python version ' + sys.version)\n",
"print('Pandas version ' + pd.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The Data\n",
"\n",
"---\n",
"\n",
"Here is the csv data if you want to follow along:\n",
"```\n",
"Date,Symbol,Volume\r\n",
"1/1/2013,A,0\r\n",
"1/2/2013,A,200\r\n",
"1/3/2013,A,1200\r\n",
"1/4/2013,A,1001\r\n",
"1/5/2013,A,1300\r\n",
"1/6/2013,A,1350\r\n",
"3/8/2013,B,500\r\n",
"3/9/2013,B,1150\r\n",
"3/10/2013,B,1180\r\n",
"3/11/2013,B,2000\r\n",
"1/5/2013,C,56600\r\n",
"1/6/2013,C,45000\r\n",
"1/7/2013,C,200\r\n",
"5/20/2013,E,1300\r\n",
"5/21/2013,E,1700\r\n",
"5/22/2013,E,900\r\n",
"5/23/2013,E,2100\r\n",
"5/24/2013,E,8000\r\n",
"5/25/2013,E,12000\r\n",
"5/26/2013,E,1900\r\n",
"5/27/2013,E,1000\r\n",
"5/28/2013,E,1900\r\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Read CSV file \n",
"\n",
"For those who are following along, note that the code below will most likely return an error for you (the file Test_9_17_Python.csv is not on your computer). \n",
"\n",
"Here are two options for you: \n",
"- Use the ```raw = pd.read_clipboard(sep=',')``` method\n",
"- Manually create the csv and save it in the same location as this notebook"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see we are working with daily trading volume for various stocks over a period of time, from January 1, 2013, to May 28, 2013. \n",
"\n",
"Here are the stock Symbols: A, B, C, and E\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Date | \n",
" Symbol | \n",
" Volume | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1/1/2013 | \n",
" A | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 1/2/2013 | \n",
" A | \n",
" 200 | \n",
"
\n",
" \n",
" 2 | \n",
" 1/3/2013 | \n",
" A | \n",
" 1200 | \n",
"
\n",
" \n",
" 3 | \n",
" 1/4/2013 | \n",
" A | \n",
" 1001 | \n",
"
\n",
" \n",
" 4 | \n",
" 1/5/2013 | \n",
" A | \n",
" 1300 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Date Symbol Volume\n",
"0 1/1/2013 A 0\n",
"1 1/2/2013 A 200\n",
"2 1/3/2013 A 1200\n",
"3 1/4/2013 A 1001\n",
"4 1/5/2013 A 1300"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"raw = pd.read_csv('Test_9_17_Python.csv')\n",
"raw.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At times I like to make a copy of the original dataframe just in case I have to go back to the original data. Yes, we can just recreate the data, but at times I rather have it handy. \n",
"\n",
"> The copy() function allows you to create a truly independent copy of the data, so changes to the new object won't affect the original"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"df = raw.copy()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fix Date Column \n",
"\n",
"Whenever you import data from basically anywhere, please please check your data types. You want to make sure the data in the correct format before you begin to work with it. \n",
"\n",
"As you can see below, the Date column, which represents our dates, came in as a string. We need to fix this."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 22 entries, 0 to 21\n",
"Data columns (total 3 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Date 22 non-null object\n",
" 1 Symbol 22 non-null object\n",
" 2 Volume 22 non-null int64 \n",
"dtypes: int64(1), object(2)\n",
"memory usage: 660.0+ bytes\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Pandas to the rescue! \n",
"\n",
"We can automagically convert our dates that are currently represented as strings and converted to datetime objects. Note that it's not always this easy and expect to battle date strings in the future."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"df['Date'] = pd.to_datetime(df['Date'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now have all our columns looking good.\n",
"\n",
"- Our dates are datetime objects\n",
"- The symbols are strings\n",
"- The trading volume are integers"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"Date datetime64[ns]\n",
"Symbol object\n",
"Volume int64\n",
"dtype: object"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Add a New Column \n",
"\n",
"Let's randomly assign a buy or sell flag to each row in this dataset. This will give us more data to work with and show off some of Pandas GroupBy capabilities."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"We created 22 strings\n"
]
},
{
"data": {
"text/plain": [
"['buy',\n",
" 'sell',\n",
" 'buy',\n",
" 'sell',\n",
" 'buy',\n",
" 'sell',\n",
" 'buy',\n",
" 'sell',\n",
" 'buy',\n",
" 'sell',\n",
" 'buy',\n",
" 'sell',\n",
" 'buy',\n",
" 'sell',\n",
" 'buy',\n",
" 'sell',\n",
" 'buy',\n",
" 'sell',\n",
" 'buy',\n",
" 'sell',\n",
" 'buy',\n",
" 'sell']"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# here are our buy and sell options\n",
"pool = ['buy','sell']\n",
"\n",
"# create our data in the correct length\n",
"# Since we have 2 items in the dataset, I just multiplied half of the dataset by the list. Can you think of a better solution?\n",
"pool = pool*int(len(df)/2)\n",
"\n",
"# how many strings did we just create?\n",
"print('We created ' + str(len(pool)) + ' strings')\n",
"\n",
"pool"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Date | \n",
" Symbol | \n",
" Volume | \n",
" flag | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2013-01-01 | \n",
" A | \n",
" 0 | \n",
" buy | \n",
"
\n",
" \n",
" 1 | \n",
" 2013-01-02 | \n",
" A | \n",
" 200 | \n",
" sell | \n",
"
\n",
" \n",
" 2 | \n",
" 2013-01-03 | \n",
" A | \n",
" 1200 | \n",
" buy | \n",
"
\n",
" \n",
" 3 | \n",
" 2013-01-04 | \n",
" A | \n",
" 1001 | \n",
" sell | \n",
"
\n",
" \n",
" 4 | \n",
" 2013-01-05 | \n",
" A | \n",
" 1300 | \n",
" buy | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Date Symbol Volume flag\n",
"0 2013-01-01 A 0 buy\n",
"1 2013-01-02 A 200 sell\n",
"2 2013-01-03 A 1200 buy\n",
"3 2013-01-04 A 1001 sell\n",
"4 2013-01-05 A 1300 buy"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# add the new column\n",
"df['flag'] = pool\n",
"\n",
"# print top 5 rows\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Group by one column \n",
"\n",
"At a high level, the **GroupBy** method will:\n",
"- Group your data by one or more columns\n",
"- And then perform some action on each group (like calculate the sum, mean, etc.)\n",
"\n",
"In the example below, we are grouping by the column named **Symbol** and then adding all the values found in the **Volume** column for that group. \n",
"\n",
"**As a side note:**\n",
"If my dataframe has non numeric columns, in this case it's the **flag** column, I have been adding the ```numeric_only=True``` code to avoid the Pandas warning message. It's kind of annoying."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Volume | \n",
"
\n",
" \n",
" Symbol | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" A | \n",
" 5051 | \n",
"
\n",
" \n",
" B | \n",
" 4830 | \n",
"
\n",
" \n",
" C | \n",
" 101800 | \n",
"
\n",
" \n",
" E | \n",
" 30800 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Volume\n",
"Symbol \n",
"A 5051\n",
"B 4830\n",
"C 101800\n",
"E 30800"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create group object\n",
"group = df.groupby('Symbol')\n",
"\n",
"# perform a sum function on the group\n",
"group.sum(numeric_only=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Let's do a Deep Dive\n",
"\n",
"When you use the groupby function in Pandas, you are grouping your data by one or more columns. These columns are called the \"grouping columns\" or \"index columns\". \n",
"\n",
"In our example above we grouped by the column **Symbol**: \n",
"> ```group = df.groupby('Symbol')``` \n",
"\n",
"- The \"Symbol\" column becomes the grouping column \n",
"\n",
"When you apply a function to this grouped DataFrame using **apply**, the function is currently applied to both the grouping columns and the data columns. \n",
"- in this case, the grouping column is \"Symbol\" \n",
"- in this case, the data column is \"Volume\" \n",
"\n",
"However, in future versions of Pandas, the grouping columns will be excluded from the operation by default. This means that the function will only be applied to the data columns (\"Volume\" in this example), and not to the grouping column named \"Symbol\". \n",
"\n",
"This change is being made to prevent unexpected behavior and to make the apply function more predictable. If you want to include the grouping columns in the operation, you will need to explicitly select them or pass include_groups=True."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Volume | \n",
"
\n",
" \n",
" Symbol | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" A | \n",
" 5051 | \n",
"
\n",
" \n",
" B | \n",
" 4830 | \n",
"
\n",
" \n",
" C | \n",
" 101800 | \n",
"
\n",
" \n",
" E | \n",
" 30800 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Volume\n",
"Symbol \n",
"A 5051\n",
"B 4830\n",
"C 101800\n",
"E 30800"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def suma(group):\n",
" ''' perform a sum on the group '''\n",
" return group.sum(numeric_only=True)\n",
"\n",
"group.apply(suma, include_groups=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's mess around with the **transform** method. Unlike the previous example, transform allows us to keep the original shape of our dataframe.\n",
"\n",
"To avoid any issues with the non-numeric flag column, I excluded it while creating the group object. Yes, I cheated a bit here."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Date | \n",
" Symbol | \n",
" Volume | \n",
" flag | \n",
" addition | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2013-01-01 | \n",
" A | \n",
" 0 | \n",
" buy | \n",
" 5051 | \n",
"
\n",
" \n",
" 1 | \n",
" 2013-01-02 | \n",
" A | \n",
" 200 | \n",
" sell | \n",
" 5051 | \n",
"
\n",
" \n",
" 2 | \n",
" 2013-01-03 | \n",
" A | \n",
" 1200 | \n",
" buy | \n",
" 5051 | \n",
"
\n",
" \n",
" 3 | \n",
" 2013-01-04 | \n",
" A | \n",
" 1001 | \n",
" sell | \n",
" 5051 | \n",
"
\n",
" \n",
" 4 | \n",
" 2013-01-05 | \n",
" A | \n",
" 1300 | \n",
" buy | \n",
" 5051 | \n",
"
\n",
" \n",
" 5 | \n",
" 2013-01-06 | \n",
" A | \n",
" 1350 | \n",
" sell | \n",
" 5051 | \n",
"
\n",
" \n",
" 6 | \n",
" 2013-03-08 | \n",
" B | \n",
" 500 | \n",
" buy | \n",
" 4830 | \n",
"
\n",
" \n",
" 7 | \n",
" 2013-03-09 | \n",
" B | \n",
" 1150 | \n",
" sell | \n",
" 4830 | \n",
"
\n",
" \n",
" 8 | \n",
" 2013-03-10 | \n",
" B | \n",
" 1180 | \n",
" buy | \n",
" 4830 | \n",
"
\n",
" \n",
" 9 | \n",
" 2013-03-11 | \n",
" B | \n",
" 2000 | \n",
" sell | \n",
" 4830 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Date Symbol Volume flag addition\n",
"0 2013-01-01 A 0 buy 5051\n",
"1 2013-01-02 A 200 sell 5051\n",
"2 2013-01-03 A 1200 buy 5051\n",
"3 2013-01-04 A 1001 sell 5051\n",
"4 2013-01-05 A 1300 buy 5051\n",
"5 2013-01-06 A 1350 sell 5051\n",
"6 2013-03-08 B 500 buy 4830\n",
"7 2013-03-09 B 1150 sell 4830\n",
"8 2013-03-10 B 1180 buy 4830\n",
"9 2013-03-11 B 2000 sell 4830"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create group object\n",
"group = df[['Symbol','Volume']].groupby('Symbol')\n",
"\n",
"def suma(group):\n",
" ''' perform a sum on the group '''\n",
" return group.sum(numeric_only=True)\n",
"\n",
"# create a new column\n",
"df['addition'] = group.transform(suma)['Volume']\n",
"df.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we get fancy!\n",
"\n",
"The function below named **test** will do the following: \n",
"- filter rows where **flag** is equal to \"sell\"\n",
"- filter rows where **Volume** is greater than 1000\n",
"\n",
"So we only return the rows that meet the two conditions above."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" Date | \n",
" Volume | \n",
" flag | \n",
" addition | \n",
"
\n",
" \n",
" Symbol | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" A | \n",
" 3 | \n",
" 2013-01-04 | \n",
" 1001 | \n",
" sell | \n",
" 5051 | \n",
"
\n",
" \n",
" 5 | \n",
" 2013-01-06 | \n",
" 1350 | \n",
" sell | \n",
" 5051 | \n",
"
\n",
" \n",
" B | \n",
" 7 | \n",
" 2013-03-09 | \n",
" 1150 | \n",
" sell | \n",
" 4830 | \n",
"
\n",
" \n",
" 9 | \n",
" 2013-03-11 | \n",
" 2000 | \n",
" sell | \n",
" 4830 | \n",
"
\n",
" \n",
" C | \n",
" 11 | \n",
" 2013-01-06 | \n",
" 45000 | \n",
" sell | \n",
" 101800 | \n",
"
\n",
" \n",
" E | \n",
" 13 | \n",
" 2013-05-20 | \n",
" 1300 | \n",
" sell | \n",
" 30800 | \n",
"
\n",
" \n",
" 17 | \n",
" 2013-05-24 | \n",
" 8000 | \n",
" sell | \n",
" 30800 | \n",
"
\n",
" \n",
" 19 | \n",
" 2013-05-26 | \n",
" 1900 | \n",
" sell | \n",
" 30800 | \n",
"
\n",
" \n",
" 21 | \n",
" 2013-05-28 | \n",
" 1900 | \n",
" sell | \n",
" 30800 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Date Volume flag addition\n",
"Symbol \n",
"A 3 2013-01-04 1001 sell 5051\n",
" 5 2013-01-06 1350 sell 5051\n",
"B 7 2013-03-09 1150 sell 4830\n",
" 9 2013-03-11 2000 sell 4830\n",
"C 11 2013-01-06 45000 sell 101800\n",
"E 13 2013-05-20 1300 sell 30800\n",
" 17 2013-05-24 8000 sell 30800\n",
" 19 2013-05-26 1900 sell 30800\n",
" 21 2013-05-28 1900 sell 30800"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"group = df.groupby('Symbol')\n",
"\n",
"def test(group):\n",
" mask1 = group.apply(lambda x: x.iloc[2]=='sell' and x.iloc[1]>1000, axis=1)\n",
" return group[mask1]\n",
"\n",
"group.apply(test, include_groups=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Group by two columns\n",
"\n",
"The examples below show you how to group by multiple columns. \n",
"\n",
"We are not going to go through new material, we are just grouping by 2 columns. You got this!"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" Volume | \n",
" addition | \n",
"
\n",
" \n",
" Symbol | \n",
" flag | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" A | \n",
" buy | \n",
" 2500 | \n",
" 15153 | \n",
"
\n",
" \n",
" sell | \n",
" 2551 | \n",
" 15153 | \n",
"
\n",
" \n",
" B | \n",
" buy | \n",
" 1680 | \n",
" 9660 | \n",
"
\n",
" \n",
" sell | \n",
" 3150 | \n",
" 9660 | \n",
"
\n",
" \n",
" C | \n",
" buy | \n",
" 56800 | \n",
" 203600 | \n",
"
\n",
" \n",
" sell | \n",
" 45000 | \n",
" 101800 | \n",
"
\n",
" \n",
" E | \n",
" buy | \n",
" 16800 | \n",
" 123200 | \n",
"
\n",
" \n",
" sell | \n",
" 14000 | \n",
" 154000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Volume addition\n",
"Symbol flag \n",
"A buy 2500 15153\n",
" sell 2551 15153\n",
"B buy 1680 9660\n",
" sell 3150 9660\n",
"C buy 56800 203600\n",
" sell 45000 101800\n",
"E buy 16800 123200\n",
" sell 14000 154000"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create group object, remember to pass a list when grouping by multiple columns\n",
"group = df.groupby(['Symbol', 'flag'])\n",
"\n",
"# perform a sum function on the group\n",
"group.sum(numeric_only=True)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"Symbol flag\n",
"A buy 2500\n",
" sell 2551\n",
"B buy 1680\n",
" sell 3150\n",
"C buy 56800\n",
" sell 45000\n",
"E buy 16800\n",
" sell 14000\n",
"dtype: int64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# apply a function to the multi column group\n",
"def suma(group):\n",
" return group['Volume'].sum(numeric_only=True)\n",
"\n",
"group.apply(suma, include_groups=False)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Date | \n",
" Symbol | \n",
" Volume | \n",
" flag | \n",
" addition | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2013-01-01 | \n",
" A | \n",
" 0 | \n",
" buy | \n",
" 2500 | \n",
"
\n",
" \n",
" 1 | \n",
" 2013-01-02 | \n",
" A | \n",
" 200 | \n",
" sell | \n",
" 2551 | \n",
"
\n",
" \n",
" 2 | \n",
" 2013-01-03 | \n",
" A | \n",
" 1200 | \n",
" buy | \n",
" 2500 | \n",
"
\n",
" \n",
" 3 | \n",
" 2013-01-04 | \n",
" A | \n",
" 1001 | \n",
" sell | \n",
" 2551 | \n",
"
\n",
" \n",
" 4 | \n",
" 2013-01-05 | \n",
" A | \n",
" 1300 | \n",
" buy | \n",
" 2500 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Date Symbol Volume flag addition\n",
"0 2013-01-01 A 0 buy 2500\n",
"1 2013-01-02 A 200 sell 2551\n",
"2 2013-01-03 A 1200 buy 2500\n",
"3 2013-01-04 A 1001 sell 2551\n",
"4 2013-01-05 A 1300 buy 2500"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"group = df[['Symbol','Volume','flag']].groupby(['Symbol', 'flag'])\n",
"\n",
"def suma(group):\n",
" return group.sum(numeric_only=True)\n",
"\n",
"df['addition'] = group.transform(suma)['Volume']\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" | \n",
" Date | \n",
" Volume | \n",
" addition | \n",
"
\n",
" \n",
" Symbol | \n",
" flag | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" A | \n",
" buy | \n",
" 2 | \n",
" 2013-01-03 | \n",
" 1200 | \n",
" 2500 | \n",
"
\n",
" \n",
" 4 | \n",
" 2013-01-05 | \n",
" 1300 | \n",
" 2500 | \n",
"
\n",
" \n",
" sell | \n",
" 3 | \n",
" 2013-01-04 | \n",
" 1001 | \n",
" 2551 | \n",
"
\n",
" \n",
" 5 | \n",
" 2013-01-06 | \n",
" 1350 | \n",
" 2551 | \n",
"
\n",
" \n",
" B | \n",
" buy | \n",
" 8 | \n",
" 2013-03-10 | \n",
" 1180 | \n",
" 1680 | \n",
"
\n",
" \n",
" sell | \n",
" 7 | \n",
" 2013-03-09 | \n",
" 1150 | \n",
" 3150 | \n",
"
\n",
" \n",
" 9 | \n",
" 2013-03-11 | \n",
" 2000 | \n",
" 3150 | \n",
"
\n",
" \n",
" C | \n",
" buy | \n",
" 10 | \n",
" 2013-01-05 | \n",
" 56600 | \n",
" 56800 | \n",
"
\n",
" \n",
" sell | \n",
" 11 | \n",
" 2013-01-06 | \n",
" 45000 | \n",
" 45000 | \n",
"
\n",
" \n",
" E | \n",
" buy | \n",
" 14 | \n",
" 2013-05-21 | \n",
" 1700 | \n",
" 16800 | \n",
"
\n",
" \n",
" 16 | \n",
" 2013-05-23 | \n",
" 2100 | \n",
" 16800 | \n",
"
\n",
" \n",
" 18 | \n",
" 2013-05-25 | \n",
" 12000 | \n",
" 16800 | \n",
"
\n",
" \n",
" sell | \n",
" 13 | \n",
" 2013-05-20 | \n",
" 1300 | \n",
" 14000 | \n",
"
\n",
" \n",
" 17 | \n",
" 2013-05-24 | \n",
" 8000 | \n",
" 14000 | \n",
"
\n",
" \n",
" 19 | \n",
" 2013-05-26 | \n",
" 1900 | \n",
" 14000 | \n",
"
\n",
" \n",
" 21 | \n",
" 2013-05-28 | \n",
" 1900 | \n",
" 14000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Date Volume addition\n",
"Symbol flag \n",
"A buy 2 2013-01-03 1200 2500\n",
" 4 2013-01-05 1300 2500\n",
" sell 3 2013-01-04 1001 2551\n",
" 5 2013-01-06 1350 2551\n",
"B buy 8 2013-03-10 1180 1680\n",
" sell 7 2013-03-09 1150 3150\n",
" 9 2013-03-11 2000 3150\n",
"C buy 10 2013-01-05 56600 56800\n",
" sell 11 2013-01-06 45000 45000\n",
"E buy 14 2013-05-21 1700 16800\n",
" 16 2013-05-23 2100 16800\n",
" 18 2013-05-25 12000 16800\n",
" sell 13 2013-05-20 1300 14000\n",
" 17 2013-05-24 8000 14000\n",
" 19 2013-05-26 1900 14000\n",
" 21 2013-05-28 1900 14000"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"group = df.groupby(['Symbol', 'flag'])\n",
"\n",
"def test(group):\n",
" mask1 = group.apply(lambda x: x.iloc[1]>1000, axis=1)\n",
" return group[mask1]\n",
"\n",
"group.apply(test)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"This tutorial was created by HEDARO
"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}