{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Series Vectorization and Broadcasting\n",
"\n",
"Just like NumPy, pandas offers powerful vectorized methods and leans on broadcasting.\n",
"\n",
"Let's explore!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"test_balance_data = {\n",
" 'pasan': 20.00,\n",
" 'treasure': 20.18,\n",
" 'ashley': 1.05,\n",
" 'craig': 42.42,\n",
"}\n",
"\n",
"test_deposit_data = {\n",
" 'pasan': 20,\n",
" 'treasure': 10,\n",
" 'ashley': 100,\n",
" 'craig': 55, \n",
"}\n",
"\n",
"balances = pd.Series(test_balance_data)\n",
"deposits = pd.Series(test_deposit_data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Vectorization\n",
"While it is indeed possible to loop through each item and apply it to another..."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pasan 40.00\n",
"treasure 30.18\n",
"ashley 101.05\n",
"craig 97.42\n",
"dtype: float64"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for label, value in deposits.iteritems():\n",
" balances[label] += value\n",
"balances"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...it's important to remember to lean on vectorization and skip the loops altogether."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pasan 40.00\n",
"treasure 30.18\n",
"ashley 101.05\n",
"craig 97.42\n",
"dtype: float64"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Undo the change using inplace subtraction\n",
"balances -= deposits\n",
"\n",
"# This is the same as the loop above using inplace addition\n",
"balances += deposits\n",
"balances"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Broadcasting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Broadcasting a Scalar\n",
"Also just like NumPy arrays, the mathematical operators have been overridden to use the vectorized versions of the same opration."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pasan 45.00\n",
"treasure 35.18\n",
"ashley 106.05\n",
"craig 102.42\n",
"dtype: float64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 5 is brodacsted and added to each and every value. This returns a new Series.\n",
"balances + 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Broadcasting a Series\n",
"Labels are used to line up entries. When the label only exists in one side, a `np.nan` (not a number ) is put in place.\n",
"\n",
"CashBox is giving out free coupons that user's can scan into the app to get $1 added to their accounts."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"craig 1\n",
"ashley 1\n",
"james 1\n",
"dtype: int64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coupons = pd.Series(1, ['craig', 'ashley', 'james'])\n",
"coupons"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are going to add the coupons to people who cashed them in. This addition will return a new `Series`. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ashley 102.05\n",
"craig 98.42\n",
"james NaN\n",
"pasan NaN\n",
"treasure NaN\n",
"dtype: float64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Returns a new Series\n",
"balances + coupons"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice how values that are not in both `Series` are set to `np.nan`. This isn't what we want! Pasan had $45.00 and now he has nothing. He is going to be so bummed!\n",
"\n",
"Also take note that James is not in the **`balances`** `Series` but he is in the **`coupons`** `Series`. Note how he is now added to the new `Series`, but his value is also set to `np.nan`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Using the `fill_value`\n",
"It is possible to fill missing values so that everything aligns. The concept is to use the `add` method directly along with the the keyword argument `fill_value`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ashley 102.05\n",
"craig 98.42\n",
"james 1.00\n",
"pasan 40.00\n",
"treasure 30.18\n",
"dtype: float64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Returns a new Series\n",
"balances.add(coupons, fill_value=0)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}