{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Series Vectorization and Broadcasting\n", "\n", "Just like NumPy, pandas offers powerful vectorized methods. It also leans on broadcasting.\n", "\n", "Let's explore!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "test_balance_data = {\n", " 'pasan': 20.00,\n", " 'treasure': 20.18,\n", " 'ashley': 1.05,\n", " 'craig': 42.42,\n", "}\n", "\n", "test_deposit_data = {\n", " 'pasan': 20,\n", " 'treasure': 10,\n", " 'ashley': 100,\n", " 'craig': 55, \n", "}\n", "\n", "balances = pd.Series(test_balance_data)\n", "deposits = pd.Series(test_deposit_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vectorization\n", "While it is indeed possible to loop through each item and apply it to another..." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pasan 40.00\n", "treasure 30.18\n", "ashley 101.05\n", "craig 97.42\n", "dtype: float64" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "for label, value in deposits.iteritems():\n", " balances[label] += value\n", "balances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...it's important to remember to lean on vectorization and skip the loops altogether. Vectorization is faster and as you can see, easier to read and write." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pasan 40.00\n", "treasure 30.18\n", "ashley 101.05\n", "craig 97.42\n", "dtype: float64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Undo the change using inplace subtraction\n", "balances -= deposits\n", "\n", "# This is the same as the loop above using inplace addition\n", "balances += deposits\n", "balances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Broadcasting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Broadcasting a Scalar\n", "Also just like NumPy arrays, the mathematical operators have been overridden to use the vectorized versions of the same operation." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pasan 45.00\n", "treasure 35.18\n", "ashley 106.05\n", "craig 102.42\n", "dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 5 is brodacsted and added to each and every value. This returns a new Series.\n", "balances + 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Broadcasting a Series\n", "Labels are used to line up entries. When the label only exists in one side, a `np.nan` (not a number ) is put in place.\n", "\n", "CashBox is giving out free coupons that user's can scan into the app to get $1 added to their accounts." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "craig 1\n", "ashley 1\n", "james 1\n", "dtype: int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coupons = pd.Series(1, ['craig', 'ashley', 'james'])\n", "coupons" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we are going to add the coupons to people who cashed them in. This addition will return a new `Series`. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ashley 102.05\n", "craig 98.42\n", "james NaN\n", "pasan NaN\n", "treasure NaN\n", "dtype: float64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Returns a new Series\n", "balances + coupons" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how values that are not in both `Series` are set to `np.nan`. This isn't what we want! Pasan had $45.00 and now he has nothing. He is going to be so bummed!\n", "\n", "Also take note that James is not in the **`balances`** `Series` but he is in the **`coupons`** `Series`. Note how he is now added to the new `Series`, but his value is also set to `np.nan`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using the `fill_value` parameter\n", "It is possible to fill missing values so that everything aligns. The concept is to use the `add` method directly along with the the keyword argument `fill_value`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ashley 102.05\n", "craig 98.42\n", "james 1.00\n", "pasan 40.00\n", "treasure 30.18\n", "dtype: float64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Returns a new Series\n", "balances.add(coupons, fill_value=0)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }