{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first part of this [series](http://pbpython.com/excel-pandas-comp.html) was very well received so I thought I would continue the theme of showing how to do common Excel tasks in pandas.\n",
"\n",
"In the first article, I focused on common, math tasks in Excel and how to do them in pandas. In this article, I'll focus on some other Excel tasks related to data selection and how to map them to pandas. \n",
"\n",
"Please refer to [this post](http://pbpython.com/excel-pandas-comp-2.html) for the full post."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting Set Up"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Import the pandas and numpy modules."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Load in the Excel data that represents a year's worth of sales."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_excel('https://github.com/chris1610/pbpython/blob/master/data/sample-salesv3.xlsx?raw=true')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Take a quick look at the data types to make sure everything came through as expected."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"account number int64\n",
"name object\n",
"sku object\n",
"quantity int64\n",
"unit price float64\n",
"ext price float64\n",
"date object\n",
"dtype: object"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You'll notice that our date column is showing up as a generic `object`. We are going to convert it to datetime object to make some selections a little easier."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"df['date'] = pd.to_datetime(df['date'])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
" 2014-01-01 07:21:51 | \n",
"
\n",
" \n",
" 1 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S2-77896 | \n",
" -1 | \n",
" 63.16 | \n",
" -63.16 | \n",
" 2014-01-01 10:00:47 | \n",
"
\n",
" \n",
" 2 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
" 2014-01-01 13:24:58 | \n",
"
\n",
" \n",
" 3 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S1-65481 | \n",
" 41 | \n",
" 21.05 | \n",
" 863.05 | \n",
" 2014-01-01 15:05:22 | \n",
"
\n",
" \n",
" 4 | \n",
" 412290 | \n",
" Jerde-Hilpert | \n",
" S2-34077 | \n",
" 6 | \n",
" 83.21 | \n",
" 499.26 | \n",
" 2014-01-01 23:26:55 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"0 740150 Barton LLC B1-20000 39 \n",
"1 714466 Trantow-Barrows S2-77896 -1 \n",
"2 218895 Kulas Inc B1-69924 23 \n",
"3 307599 Kassulke, Ondricka and Metz S1-65481 41 \n",
"4 412290 Jerde-Hilpert S2-34077 6 \n",
"\n",
" unit price ext price date \n",
"0 86.69 3380.91 2014-01-01 07:21:51 \n",
"1 63.16 -63.16 2014-01-01 10:00:47 \n",
"2 90.70 2086.10 2014-01-01 13:24:58 \n",
"3 21.05 863.05 2014-01-01 15:05:22 \n",
"4 83.21 499.26 2014-01-01 23:26:55 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"account number int64\n",
"name object\n",
"sku object\n",
"quantity int64\n",
"unit price float64\n",
"ext price float64\n",
"date datetime64[ns]\n",
"dtype: object"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The date is now a datetime object which will be useful in future steps."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Filtering the data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similar to the autofilter function in Excel, you can use pandas to filter and select certain subsets of data.\n",
"\n",
"For instance, if we want to just see a specific account number, we can easily do that with pandas.\n",
"\n",
"Note, I am going to use the `head` function to show the top results. This is purely for the purposes of keeping the article shorter."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 3 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S1-65481 | \n",
" 41 | \n",
" 21.05 | \n",
" 863.05 | \n",
" 2014-01-01 15:05:22 | \n",
"
\n",
" \n",
" 13 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S2-10342 | \n",
" 17 | \n",
" 12.44 | \n",
" 211.48 | \n",
" 2014-01-04 07:53:01 | \n",
"
\n",
" \n",
" 34 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S2-78676 | \n",
" 35 | \n",
" 33.04 | \n",
" 1156.40 | \n",
" 2014-01-10 05:26:31 | \n",
"
\n",
" \n",
" 58 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" B1-20000 | \n",
" 22 | \n",
" 37.87 | \n",
" 833.14 | \n",
" 2014-01-15 16:22:22 | \n",
"
\n",
" \n",
" 70 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S2-10342 | \n",
" 44 | \n",
" 96.79 | \n",
" 4258.76 | \n",
" 2014-01-18 06:32:31 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"3 307599 Kassulke, Ondricka and Metz S1-65481 41 \n",
"13 307599 Kassulke, Ondricka and Metz S2-10342 17 \n",
"34 307599 Kassulke, Ondricka and Metz S2-78676 35 \n",
"58 307599 Kassulke, Ondricka and Metz B1-20000 22 \n",
"70 307599 Kassulke, Ondricka and Metz S2-10342 44 \n",
"\n",
" unit price ext price date \n",
"3 21.05 863.05 2014-01-01 15:05:22 \n",
"13 12.44 211.48 2014-01-04 07:53:01 \n",
"34 33.04 1156.40 2014-01-10 05:26:31 \n",
"58 37.87 833.14 2014-01-15 16:22:22 \n",
"70 96.79 4258.76 2014-01-18 06:32:31 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df[\"account number\"]==307599].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You could also do the filtering based on numeric values."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
" 2014-01-01 07:21:51 | \n",
"
\n",
" \n",
" 2 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
" 2014-01-01 13:24:58 | \n",
"
\n",
" \n",
" 3 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S1-65481 | \n",
" 41 | \n",
" 21.05 | \n",
" 863.05 | \n",
" 2014-01-01 15:05:22 | \n",
"
\n",
" \n",
" 14 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
" B1-53102 | \n",
" 23 | \n",
" 71.56 | \n",
" 1645.88 | \n",
" 2014-01-04 08:57:48 | \n",
"
\n",
" \n",
" 15 | \n",
" 239344 | \n",
" Stokes LLC | \n",
" S1-06532 | \n",
" 34 | \n",
" 71.51 | \n",
" 2431.34 | \n",
" 2014-01-04 11:34:58 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"0 740150 Barton LLC B1-20000 39 \n",
"2 218895 Kulas Inc B1-69924 23 \n",
"3 307599 Kassulke, Ondricka and Metz S1-65481 41 \n",
"14 737550 Fritsch, Russel and Anderson B1-53102 23 \n",
"15 239344 Stokes LLC S1-06532 34 \n",
"\n",
" unit price ext price date \n",
"0 86.69 3380.91 2014-01-01 07:21:51 \n",
"2 90.70 2086.10 2014-01-01 13:24:58 \n",
"3 21.05 863.05 2014-01-01 15:05:22 \n",
"14 71.56 1645.88 2014-01-04 08:57:48 \n",
"15 71.51 2431.34 2014-01-04 11:34:58 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df[\"quantity\"] > 22].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we want to do more complex filtering, we can use `map` to filter. In this example, let's look for items with sku's that start with B1."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
" 2014-01-01 07:21:51 | \n",
"
\n",
" \n",
" 2 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
" 2014-01-01 13:24:58 | \n",
"
\n",
" \n",
" 6 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-65551 | \n",
" 2 | \n",
" 31.10 | \n",
" 62.20 | \n",
" 2014-01-02 10:57:23 | \n",
"
\n",
" \n",
" 14 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
" B1-53102 | \n",
" 23 | \n",
" 71.56 | \n",
" 1645.88 | \n",
" 2014-01-04 08:57:48 | \n",
"
\n",
" \n",
" 17 | \n",
" 239344 | \n",
" Stokes LLC | \n",
" B1-50809 | \n",
" 14 | \n",
" 16.23 | \n",
" 227.22 | \n",
" 2014-01-04 22:14:32 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"0 740150 Barton LLC B1-20000 39 \n",
"2 218895 Kulas Inc B1-69924 23 \n",
"6 218895 Kulas Inc B1-65551 2 \n",
"14 737550 Fritsch, Russel and Anderson B1-53102 23 \n",
"17 239344 Stokes LLC B1-50809 14 \n",
"\n",
" unit price ext price date \n",
"0 86.69 3380.91 2014-01-01 07:21:51 \n",
"2 90.70 2086.10 2014-01-01 13:24:58 \n",
"6 31.10 62.20 2014-01-02 10:57:23 \n",
"14 71.56 1645.88 2014-01-04 08:57:48 \n",
"17 16.23 227.22 2014-01-04 22:14:32 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df[\"sku\"].map(lambda x: x.startswith('B1'))].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's easy to chain two statements together using the &."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
" 2014-01-01 07:21:51 | \n",
"
\n",
" \n",
" 2 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
" 2014-01-01 13:24:58 | \n",
"
\n",
" \n",
" 14 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
" B1-53102 | \n",
" 23 | \n",
" 71.56 | \n",
" 1645.88 | \n",
" 2014-01-04 08:57:48 | \n",
"
\n",
" \n",
" 26 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
" B1-53636 | \n",
" 42 | \n",
" 42.06 | \n",
" 1766.52 | \n",
" 2014-01-08 00:02:11 | \n",
"
\n",
" \n",
" 31 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" B1-33087 | \n",
" 32 | \n",
" 19.56 | \n",
" 625.92 | \n",
" 2014-01-09 10:16:32 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"0 740150 Barton LLC B1-20000 39 \n",
"2 218895 Kulas Inc B1-69924 23 \n",
"14 737550 Fritsch, Russel and Anderson B1-53102 23 \n",
"26 737550 Fritsch, Russel and Anderson B1-53636 42 \n",
"31 714466 Trantow-Barrows B1-33087 32 \n",
"\n",
" unit price ext price date \n",
"0 86.69 3380.91 2014-01-01 07:21:51 \n",
"2 90.70 2086.10 2014-01-01 13:24:58 \n",
"14 71.56 1645.88 2014-01-04 08:57:48 \n",
"26 42.06 1766.52 2014-01-08 00:02:11 \n",
"31 19.56 625.92 2014-01-09 10:16:32 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df[\"sku\"].map(lambda x: x.startswith('B1')) & (df[\"quantity\"] > 22)].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another useful function that pandas supports is called `isin`. It allows us to define a list of values we want to look for.\n",
"\n",
"In this case, we look for all records that include two specific account numbers."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S2-77896 | \n",
" -1 | \n",
" 63.16 | \n",
" -63.16 | \n",
" 2014-01-01 10:00:47 | \n",
"
\n",
" \n",
" 2 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
" 2014-01-01 13:24:58 | \n",
"
\n",
" \n",
" 5 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S2-77896 | \n",
" 17 | \n",
" 87.63 | \n",
" 1489.71 | \n",
" 2014-01-02 10:07:15 | \n",
"
\n",
" \n",
" 6 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-65551 | \n",
" 2 | \n",
" 31.10 | \n",
" 62.20 | \n",
" 2014-01-02 10:57:23 | \n",
"
\n",
" \n",
" 8 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S1-50961 | \n",
" 22 | \n",
" 84.09 | \n",
" 1849.98 | \n",
" 2014-01-03 11:29:02 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity unit price ext price \\\n",
"1 714466 Trantow-Barrows S2-77896 -1 63.16 -63.16 \n",
"2 218895 Kulas Inc B1-69924 23 90.70 2086.10 \n",
"5 714466 Trantow-Barrows S2-77896 17 87.63 1489.71 \n",
"6 218895 Kulas Inc B1-65551 2 31.10 62.20 \n",
"8 714466 Trantow-Barrows S1-50961 22 84.09 1849.98 \n",
"\n",
" date \n",
"1 2014-01-01 10:00:47 \n",
"2 2014-01-01 13:24:58 \n",
"5 2014-01-02 10:07:15 \n",
"6 2014-01-02 10:57:23 \n",
"8 2014-01-03 11:29:02 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df[\"account number\"].isin([714466,218895])].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas supports another function called `query` which allows you to efficiently select subsets of data. It does require the installation of [numexpr](https://github.com/pydata/numexpr) so make sure you have it installed before trying this step.\n",
"\n",
"If you would like to get a list of customers by name, you can do that with a query, similar to the python syntax shown above."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
" 2014-01-01 07:21:51 | \n",
"
\n",
" \n",
" 2 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
" 2014-01-01 13:24:58 | \n",
"
\n",
" \n",
" 6 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-65551 | \n",
" 2 | \n",
" 31.10 | \n",
" 62.20 | \n",
" 2014-01-02 10:57:23 | \n",
"
\n",
" \n",
" 33 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" S1-06532 | \n",
" 3 | \n",
" 22.36 | \n",
" 67.08 | \n",
" 2014-01-09 23:58:27 | \n",
"
\n",
" \n",
" 36 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" S2-34077 | \n",
" 16 | \n",
" 73.04 | \n",
" 1168.64 | \n",
" 2014-01-10 12:07:30 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity unit price ext price \\\n",
"0 740150 Barton LLC B1-20000 39 86.69 3380.91 \n",
"2 218895 Kulas Inc B1-69924 23 90.70 2086.10 \n",
"6 218895 Kulas Inc B1-65551 2 31.10 62.20 \n",
"33 218895 Kulas Inc S1-06532 3 22.36 67.08 \n",
"36 218895 Kulas Inc S2-34077 16 73.04 1168.64 \n",
"\n",
" date \n",
"0 2014-01-01 07:21:51 \n",
"2 2014-01-01 13:24:58 \n",
"6 2014-01-02 10:57:23 \n",
"33 2014-01-09 23:58:27 \n",
"36 2014-01-10 12:07:30 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.query('name == [\"Kulas Inc\",\"Barton LLC\"]').head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The query function allows you do more than just this simple example but for the purposes of this discussion, I'm showing it so you are aware that it is out there for you."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Working with Dates"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using pandas, you can do complex filtering on dates. Before doing anything with dates, I encourage you to sort by the date column to make sure the results return what you are expecting."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
" 2014-01-01 07:21:51 | \n",
"
\n",
" \n",
" 1 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S2-77896 | \n",
" -1 | \n",
" 63.16 | \n",
" -63.16 | \n",
" 2014-01-01 10:00:47 | \n",
"
\n",
" \n",
" 2 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
" 2014-01-01 13:24:58 | \n",
"
\n",
" \n",
" 3 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S1-65481 | \n",
" 41 | \n",
" 21.05 | \n",
" 863.05 | \n",
" 2014-01-01 15:05:22 | \n",
"
\n",
" \n",
" 4 | \n",
" 412290 | \n",
" Jerde-Hilpert | \n",
" S2-34077 | \n",
" 6 | \n",
" 83.21 | \n",
" 499.26 | \n",
" 2014-01-01 23:26:55 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"0 740150 Barton LLC B1-20000 39 \n",
"1 714466 Trantow-Barrows S2-77896 -1 \n",
"2 218895 Kulas Inc B1-69924 23 \n",
"3 307599 Kassulke, Ondricka and Metz S1-65481 41 \n",
"4 412290 Jerde-Hilpert S2-34077 6 \n",
"\n",
" unit price ext price date \n",
"0 86.69 3380.91 2014-01-01 07:21:51 \n",
"1 63.16 -63.16 2014-01-01 10:00:47 \n",
"2 90.70 2086.10 2014-01-01 13:24:58 \n",
"3 21.05 863.05 2014-01-01 15:05:22 \n",
"4 83.21 499.26 2014-01-01 23:26:55 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = df.sort_values(by='date')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The python filtering syntax shown before works with dates."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 1042 | \n",
" 163416 | \n",
" Purdy-Kunde | \n",
" B1-38851 | \n",
" 41 | \n",
" 98.69 | \n",
" 4046.29 | \n",
" 2014-09-05 01:52:32 | \n",
"
\n",
" \n",
" 1043 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S1-30248 | \n",
" 1 | \n",
" 37.16 | \n",
" 37.16 | \n",
" 2014-09-05 06:17:19 | \n",
"
\n",
" \n",
" 1044 | \n",
" 729833 | \n",
" Koepp Ltd | \n",
" S1-65481 | \n",
" 48 | \n",
" 16.04 | \n",
" 769.92 | \n",
" 2014-09-05 08:54:41 | \n",
"
\n",
" \n",
" 1045 | \n",
" 729833 | \n",
" Koepp Ltd | \n",
" S2-11481 | \n",
" 6 | \n",
" 26.50 | \n",
" 159.00 | \n",
" 2014-09-05 16:33:15 | \n",
"
\n",
" \n",
" 1046 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
" B1-33364 | \n",
" 4 | \n",
" 76.44 | \n",
" 305.76 | \n",
" 2014-09-06 08:59:08 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"1042 163416 Purdy-Kunde B1-38851 41 \n",
"1043 714466 Trantow-Barrows S1-30248 1 \n",
"1044 729833 Koepp Ltd S1-65481 48 \n",
"1045 729833 Koepp Ltd S2-11481 6 \n",
"1046 737550 Fritsch, Russel and Anderson B1-33364 4 \n",
"\n",
" unit price ext price date \n",
"1042 98.69 4046.29 2014-09-05 01:52:32 \n",
"1043 37.16 37.16 2014-09-05 06:17:19 \n",
"1044 16.04 769.92 2014-09-05 08:54:41 \n",
"1045 26.50 159.00 2014-09-05 16:33:15 \n",
"1046 76.44 305.76 2014-09-06 08:59:08 "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['date'] >='20140905'].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the really nice features of pandas is that it understands dates so will allow us to do partial filtering. If we want to only look for data more recent than a specific month, we can do so."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 242 | \n",
" 163416 | \n",
" Purdy-Kunde | \n",
" S1-30248 | \n",
" 19 | \n",
" 65.03 | \n",
" 1235.57 | \n",
" 2014-03-01 16:07:40 | \n",
"
\n",
" \n",
" 243 | \n",
" 527099 | \n",
" Sanford and Sons | \n",
" S2-82423 | \n",
" 3 | \n",
" 76.21 | \n",
" 228.63 | \n",
" 2014-03-01 17:18:01 | \n",
"
\n",
" \n",
" 244 | \n",
" 527099 | \n",
" Sanford and Sons | \n",
" B1-50809 | \n",
" 8 | \n",
" 70.78 | \n",
" 566.24 | \n",
" 2014-03-01 18:53:09 | \n",
"
\n",
" \n",
" 245 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
" B1-50809 | \n",
" 20 | \n",
" 50.11 | \n",
" 1002.20 | \n",
" 2014-03-01 23:47:17 | \n",
"
\n",
" \n",
" 246 | \n",
" 688981 | \n",
" Keeling LLC | \n",
" B1-86481 | \n",
" -1 | \n",
" 97.16 | \n",
" -97.16 | \n",
" 2014-03-02 01:46:44 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"242 163416 Purdy-Kunde S1-30248 19 \n",
"243 527099 Sanford and Sons S2-82423 3 \n",
"244 527099 Sanford and Sons B1-50809 8 \n",
"245 737550 Fritsch, Russel and Anderson B1-50809 20 \n",
"246 688981 Keeling LLC B1-86481 -1 \n",
"\n",
" unit price ext price date \n",
"242 65.03 1235.57 2014-03-01 16:07:40 \n",
"243 76.21 228.63 2014-03-01 17:18:01 \n",
"244 70.78 566.24 2014-03-01 18:53:09 \n",
"245 50.11 1002.20 2014-03-01 23:47:17 \n",
"246 97.16 -97.16 2014-03-02 01:46:44 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['date'] >='2014-03'].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, you can chain the criteria."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 778 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
" S1-65481 | \n",
" 35 | \n",
" 70.51 | \n",
" 2467.85 | \n",
" 2014-07-01 00:21:58 | \n",
"
\n",
" \n",
" 779 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" S1-30248 | \n",
" 9 | \n",
" 16.56 | \n",
" 149.04 | \n",
" 2014-07-01 00:52:38 | \n",
"
\n",
" \n",
" 780 | \n",
" 163416 | \n",
" Purdy-Kunde | \n",
" S2-82423 | \n",
" 44 | \n",
" 68.27 | \n",
" 3003.88 | \n",
" 2014-07-01 08:15:52 | \n",
"
\n",
" \n",
" 781 | \n",
" 672390 | \n",
" Kuhn-Gusikowski | \n",
" B1-04202 | \n",
" 48 | \n",
" 99.39 | \n",
" 4770.72 | \n",
" 2014-07-01 11:12:13 | \n",
"
\n",
" \n",
" 782 | \n",
" 642753 | \n",
" Pollich LLC | \n",
" S2-23246 | \n",
" 1 | \n",
" 51.29 | \n",
" 51.29 | \n",
" 2014-07-02 04:02:39 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"778 737550 Fritsch, Russel and Anderson S1-65481 35 \n",
"779 218895 Kulas Inc S1-30248 9 \n",
"780 163416 Purdy-Kunde S2-82423 44 \n",
"781 672390 Kuhn-Gusikowski B1-04202 48 \n",
"782 642753 Pollich LLC S2-23246 1 \n",
"\n",
" unit price ext price date \n",
"778 70.51 2467.85 2014-07-01 00:21:58 \n",
"779 16.56 149.04 2014-07-01 00:52:38 \n",
"780 68.27 3003.88 2014-07-01 08:15:52 \n",
"781 99.39 4770.72 2014-07-01 11:12:13 \n",
"782 51.29 51.29 2014-07-02 04:02:39 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['date'] >='20140701') & (df['date'] <= '20140715')].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because pandas understands date columns, you can express the date value in multiple formats and it will give you the results you expect."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 1141 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" B1-50809 | \n",
" 25 | \n",
" 56.63 | \n",
" 1415.75 | \n",
" 2014-10-01 10:56:32 | \n",
"
\n",
" \n",
" 1142 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
" S2-82423 | \n",
" 38 | \n",
" 45.17 | \n",
" 1716.46 | \n",
" 2014-10-01 16:17:24 | \n",
"
\n",
" \n",
" 1143 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
" S1-47412 | \n",
" 6 | \n",
" 68.68 | \n",
" 412.08 | \n",
" 2014-10-01 22:28:49 | \n",
"
\n",
" \n",
" 1144 | \n",
" 146832 | \n",
" Kiehn-Spinka | \n",
" S2-11481 | \n",
" 13 | \n",
" 18.80 | \n",
" 244.40 | \n",
" 2014-10-02 00:31:01 | \n",
"
\n",
" \n",
" 1145 | \n",
" 424914 | \n",
" White-Trantow | \n",
" B1-53102 | \n",
" 9 | \n",
" 94.47 | \n",
" 850.23 | \n",
" 2014-10-02 02:48:26 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"1141 307599 Kassulke, Ondricka and Metz B1-50809 25 \n",
"1142 737550 Fritsch, Russel and Anderson S2-82423 38 \n",
"1143 737550 Fritsch, Russel and Anderson S1-47412 6 \n",
"1144 146832 Kiehn-Spinka S2-11481 13 \n",
"1145 424914 White-Trantow B1-53102 9 \n",
"\n",
" unit price ext price date \n",
"1141 56.63 1415.75 2014-10-01 10:56:32 \n",
"1142 45.17 1716.46 2014-10-01 16:17:24 \n",
"1143 68.68 412.08 2014-10-01 22:28:49 \n",
"1144 18.80 244.40 2014-10-02 00:31:01 \n",
"1145 94.47 850.23 2014-10-02 02:48:26 "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['date'] >= 'Oct-2014'].head()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 1174 | \n",
" 257198 | \n",
" Cronin, Oberbrunner and Spencer | \n",
" S2-34077 | \n",
" 13 | \n",
" 12.24 | \n",
" 159.12 | \n",
" 2014-10-10 02:59:06 | \n",
"
\n",
" \n",
" 1175 | \n",
" 740150 | \n",
" Barton LLC | \n",
" S1-65481 | \n",
" 28 | \n",
" 53.00 | \n",
" 1484.00 | \n",
" 2014-10-10 15:08:53 | \n",
"
\n",
" \n",
" 1176 | \n",
" 146832 | \n",
" Kiehn-Spinka | \n",
" S1-27722 | \n",
" 15 | \n",
" 64.39 | \n",
" 965.85 | \n",
" 2014-10-10 18:24:01 | \n",
"
\n",
" \n",
" 1177 | \n",
" 257198 | \n",
" Cronin, Oberbrunner and Spencer | \n",
" S2-16558 | \n",
" 3 | \n",
" 35.34 | \n",
" 106.02 | \n",
" 2014-10-11 01:48:13 | \n",
"
\n",
" \n",
" 1178 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
" B1-53636 | \n",
" 10 | \n",
" 56.95 | \n",
" 569.50 | \n",
" 2014-10-11 10:25:53 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"1174 257198 Cronin, Oberbrunner and Spencer S2-34077 13 \n",
"1175 740150 Barton LLC S1-65481 28 \n",
"1176 146832 Kiehn-Spinka S1-27722 15 \n",
"1177 257198 Cronin, Oberbrunner and Spencer S2-16558 3 \n",
"1178 737550 Fritsch, Russel and Anderson B1-53636 10 \n",
"\n",
" unit price ext price date \n",
"1174 12.24 159.12 2014-10-10 02:59:06 \n",
"1175 53.00 1484.00 2014-10-10 15:08:53 \n",
"1176 64.39 965.85 2014-10-10 18:24:01 \n",
"1177 35.34 106.02 2014-10-11 01:48:13 \n",
"1178 56.95 569.50 2014-10-11 10:25:53 "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['date'] >= '10-10-2014'].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When working with time series data, if we convert the data to use the date as at the index, we can do some more filtering.\n",
"\n",
"Set the new index using `set_index`."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
"
\n",
" \n",
" date | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2014-01-01 07:21:51 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
"
\n",
" \n",
" 2014-01-01 10:00:47 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S2-77896 | \n",
" -1 | \n",
" 63.16 | \n",
" -63.16 | \n",
"
\n",
" \n",
" 2014-01-01 13:24:58 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
"
\n",
" \n",
" 2014-01-01 15:05:22 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S1-65481 | \n",
" 41 | \n",
" 21.05 | \n",
" 863.05 | \n",
"
\n",
" \n",
" 2014-01-01 23:26:55 | \n",
" 412290 | \n",
" Jerde-Hilpert | \n",
" S2-34077 | \n",
" 6 | \n",
" 83.21 | \n",
" 499.26 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku \\\n",
"date \n",
"2014-01-01 07:21:51 740150 Barton LLC B1-20000 \n",
"2014-01-01 10:00:47 714466 Trantow-Barrows S2-77896 \n",
"2014-01-01 13:24:58 218895 Kulas Inc B1-69924 \n",
"2014-01-01 15:05:22 307599 Kassulke, Ondricka and Metz S1-65481 \n",
"2014-01-01 23:26:55 412290 Jerde-Hilpert S2-34077 \n",
"\n",
" quantity unit price ext price \n",
"date \n",
"2014-01-01 07:21:51 39 86.69 3380.91 \n",
"2014-01-01 10:00:47 -1 63.16 -63.16 \n",
"2014-01-01 13:24:58 23 90.70 2086.10 \n",
"2014-01-01 15:05:22 41 21.05 863.05 \n",
"2014-01-01 23:26:55 6 83.21 499.26 "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2 = df.set_index(['date'])\n",
"df2.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can slice the data to get a range."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
"
\n",
" \n",
" date | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2014-01-01 07:21:51 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
"
\n",
" \n",
" 2014-01-01 10:00:47 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S2-77896 | \n",
" -1 | \n",
" 63.16 | \n",
" -63.16 | \n",
"
\n",
" \n",
" 2014-01-01 13:24:58 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
"
\n",
" \n",
" 2014-01-01 15:05:22 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S1-65481 | \n",
" 41 | \n",
" 21.05 | \n",
" 863.05 | \n",
"
\n",
" \n",
" 2014-01-01 23:26:55 | \n",
" 412290 | \n",
" Jerde-Hilpert | \n",
" S2-34077 | \n",
" 6 | \n",
" 83.21 | \n",
" 499.26 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku \\\n",
"date \n",
"2014-01-01 07:21:51 740150 Barton LLC B1-20000 \n",
"2014-01-01 10:00:47 714466 Trantow-Barrows S2-77896 \n",
"2014-01-01 13:24:58 218895 Kulas Inc B1-69924 \n",
"2014-01-01 15:05:22 307599 Kassulke, Ondricka and Metz S1-65481 \n",
"2014-01-01 23:26:55 412290 Jerde-Hilpert S2-34077 \n",
"\n",
" quantity unit price ext price \n",
"date \n",
"2014-01-01 07:21:51 39 86.69 3380.91 \n",
"2014-01-01 10:00:47 -1 63.16 -63.16 \n",
"2014-01-01 13:24:58 23 90.70 2086.10 \n",
"2014-01-01 15:05:22 41 21.05 863.05 \n",
"2014-01-01 23:26:55 6 83.21 499.26 "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2[\"20140101\":\"20140201\"].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once again, we can use various date representations to remove any ambiguity around date naming conventions."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
"
\n",
" \n",
" date | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2014-01-01 07:21:51 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
"
\n",
" \n",
" 2014-01-01 10:00:47 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S2-77896 | \n",
" -1 | \n",
" 63.16 | \n",
" -63.16 | \n",
"
\n",
" \n",
" 2014-01-01 13:24:58 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
"
\n",
" \n",
" 2014-01-01 15:05:22 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S1-65481 | \n",
" 41 | \n",
" 21.05 | \n",
" 863.05 | \n",
"
\n",
" \n",
" 2014-01-01 23:26:55 | \n",
" 412290 | \n",
" Jerde-Hilpert | \n",
" S2-34077 | \n",
" 6 | \n",
" 83.21 | \n",
" 499.26 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku \\\n",
"date \n",
"2014-01-01 07:21:51 740150 Barton LLC B1-20000 \n",
"2014-01-01 10:00:47 714466 Trantow-Barrows S2-77896 \n",
"2014-01-01 13:24:58 218895 Kulas Inc B1-69924 \n",
"2014-01-01 15:05:22 307599 Kassulke, Ondricka and Metz S1-65481 \n",
"2014-01-01 23:26:55 412290 Jerde-Hilpert S2-34077 \n",
"\n",
" quantity unit price ext price \n",
"date \n",
"2014-01-01 07:21:51 39 86.69 3380.91 \n",
"2014-01-01 10:00:47 -1 63.16 -63.16 \n",
"2014-01-01 13:24:58 23 90.70 2086.10 \n",
"2014-01-01 15:05:22 41 21.05 863.05 \n",
"2014-01-01 23:26:55 6 83.21 499.26 "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2[\"2014-Jan-1\":\"2014-Feb-1\"].head()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
"
\n",
" \n",
" date | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2014-01-31 22:51:18 | \n",
" 383080 | \n",
" Will LLC | \n",
" B1-05914 | \n",
" 43 | \n",
" 80.17 | \n",
" 3447.31 | \n",
"
\n",
" \n",
" 2014-02-01 09:04:59 | \n",
" 383080 | \n",
" Will LLC | \n",
" B1-20000 | \n",
" 7 | \n",
" 33.69 | \n",
" 235.83 | \n",
"
\n",
" \n",
" 2014-02-01 11:51:46 | \n",
" 412290 | \n",
" Jerde-Hilpert | \n",
" S1-27722 | \n",
" 11 | \n",
" 21.12 | \n",
" 232.32 | \n",
"
\n",
" \n",
" 2014-02-01 17:24:32 | \n",
" 412290 | \n",
" Jerde-Hilpert | \n",
" B1-86481 | \n",
" 3 | \n",
" 35.99 | \n",
" 107.97 | \n",
"
\n",
" \n",
" 2014-02-01 19:56:48 | \n",
" 412290 | \n",
" Jerde-Hilpert | \n",
" B1-20000 | \n",
" 23 | \n",
" 78.90 | \n",
" 1814.70 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"date \n",
"2014-01-31 22:51:18 383080 Will LLC B1-05914 43 \n",
"2014-02-01 09:04:59 383080 Will LLC B1-20000 7 \n",
"2014-02-01 11:51:46 412290 Jerde-Hilpert S1-27722 11 \n",
"2014-02-01 17:24:32 412290 Jerde-Hilpert B1-86481 3 \n",
"2014-02-01 19:56:48 412290 Jerde-Hilpert B1-20000 23 \n",
"\n",
" unit price ext price \n",
"date \n",
"2014-01-31 22:51:18 80.17 3447.31 \n",
"2014-02-01 09:04:59 33.69 235.83 \n",
"2014-02-01 11:51:46 21.12 232.32 \n",
"2014-02-01 17:24:32 35.99 107.97 \n",
"2014-02-01 19:56:48 78.90 1814.70 "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2[\"2014-Jan-1\":\"2014-Feb-1\"].tail()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
"
\n",
" \n",
" date | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2014-01-01 07:21:51 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
"
\n",
" \n",
" 2014-01-01 10:00:47 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S2-77896 | \n",
" -1 | \n",
" 63.16 | \n",
" -63.16 | \n",
"
\n",
" \n",
" 2014-01-01 13:24:58 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
"
\n",
" \n",
" 2014-01-01 15:05:22 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S1-65481 | \n",
" 41 | \n",
" 21.05 | \n",
" 863.05 | \n",
"
\n",
" \n",
" 2014-01-01 23:26:55 | \n",
" 412290 | \n",
" Jerde-Hilpert | \n",
" S2-34077 | \n",
" 6 | \n",
" 83.21 | \n",
" 499.26 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku \\\n",
"date \n",
"2014-01-01 07:21:51 740150 Barton LLC B1-20000 \n",
"2014-01-01 10:00:47 714466 Trantow-Barrows S2-77896 \n",
"2014-01-01 13:24:58 218895 Kulas Inc B1-69924 \n",
"2014-01-01 15:05:22 307599 Kassulke, Ondricka and Metz S1-65481 \n",
"2014-01-01 23:26:55 412290 Jerde-Hilpert S2-34077 \n",
"\n",
" quantity unit price ext price \n",
"date \n",
"2014-01-01 07:21:51 39 86.69 3380.91 \n",
"2014-01-01 10:00:47 -1 63.16 -63.16 \n",
"2014-01-01 13:24:58 23 90.70 2086.10 \n",
"2014-01-01 15:05:22 41 21.05 863.05 \n",
"2014-01-01 23:26:55 6 83.21 499.26 "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2[\"2014\"].head()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
"
\n",
" \n",
" date | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2014-12-01 20:15:34 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S1-82801 | \n",
" 3 | \n",
" 77.97 | \n",
" 233.91 | \n",
"
\n",
" \n",
" 2014-12-02 20:00:04 | \n",
" 146832 | \n",
" Kiehn-Spinka | \n",
" S2-23246 | \n",
" 37 | \n",
" 57.81 | \n",
" 2138.97 | \n",
"
\n",
" \n",
" 2014-12-03 04:43:53 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" S2-77896 | \n",
" 30 | \n",
" 77.44 | \n",
" 2323.20 | \n",
"
\n",
" \n",
" 2014-12-03 06:05:43 | \n",
" 141962 | \n",
" Herman LLC | \n",
" B1-53102 | \n",
" 20 | \n",
" 26.12 | \n",
" 522.40 | \n",
"
\n",
" \n",
" 2014-12-03 14:17:34 | \n",
" 642753 | \n",
" Pollich LLC | \n",
" B1-53636 | \n",
" 19 | \n",
" 71.21 | \n",
" 1352.99 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"date \n",
"2014-12-01 20:15:34 714466 Trantow-Barrows S1-82801 3 \n",
"2014-12-02 20:00:04 146832 Kiehn-Spinka S2-23246 37 \n",
"2014-12-03 04:43:53 218895 Kulas Inc S2-77896 30 \n",
"2014-12-03 06:05:43 141962 Herman LLC B1-53102 20 \n",
"2014-12-03 14:17:34 642753 Pollich LLC B1-53636 19 \n",
"\n",
" unit price ext price \n",
"date \n",
"2014-12-01 20:15:34 77.97 233.91 \n",
"2014-12-02 20:00:04 57.81 2138.97 \n",
"2014-12-03 04:43:53 77.44 2323.20 \n",
"2014-12-03 06:05:43 26.12 522.40 \n",
"2014-12-03 14:17:34 71.21 1352.99 "
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2[\"2014-Dec\"].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Additional String Functions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas has support for vectorized string functions as well. If we want to identify all the skus that contain a certain value, we can use `str.contains`. In this case, we know that the sku is always represented in the same way, so B1 only shows up in the front of the sku."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
" 2014-01-01 07:21:51 | \n",
"
\n",
" \n",
" 2 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
" 2014-01-01 13:24:58 | \n",
"
\n",
" \n",
" 6 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-65551 | \n",
" 2 | \n",
" 31.10 | \n",
" 62.20 | \n",
" 2014-01-02 10:57:23 | \n",
"
\n",
" \n",
" 14 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
" B1-53102 | \n",
" 23 | \n",
" 71.56 | \n",
" 1645.88 | \n",
" 2014-01-04 08:57:48 | \n",
"
\n",
" \n",
" 17 | \n",
" 239344 | \n",
" Stokes LLC | \n",
" B1-50809 | \n",
" 14 | \n",
" 16.23 | \n",
" 227.22 | \n",
" 2014-01-04 22:14:32 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"0 740150 Barton LLC B1-20000 39 \n",
"2 218895 Kulas Inc B1-69924 23 \n",
"6 218895 Kulas Inc B1-65551 2 \n",
"14 737550 Fritsch, Russel and Anderson B1-53102 23 \n",
"17 239344 Stokes LLC B1-50809 14 \n",
"\n",
" unit price ext price date \n",
"0 86.69 3380.91 2014-01-01 07:21:51 \n",
"2 90.70 2086.10 2014-01-01 13:24:58 \n",
"6 31.10 62.20 2014-01-02 10:57:23 \n",
"14 71.56 1645.88 2014-01-04 08:57:48 \n",
"17 16.23 227.22 2014-01-04 22:14:32 "
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['sku'].str.contains('B1')].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can string queries together and use sort to control how the data is ordered."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A common need I have in Excel is to understand all the unique items in a column. For instance, maybe I only want to know when customers purchased in this time period. The unique function makes this trivial."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 684 | \n",
" 642753 | \n",
" Pollich LLC | \n",
" B1-53102 | \n",
" 46 | \n",
" 26.07 | \n",
" 1199.22 | \n",
" 2014-06-08 19:33:33 | \n",
"
\n",
" \n",
" 792 | \n",
" 688981 | \n",
" Keeling LLC | \n",
" B1-53102 | \n",
" 45 | \n",
" 41.19 | \n",
" 1853.55 | \n",
" 2014-07-04 21:42:22 | \n",
"
\n",
" \n",
" 176 | \n",
" 383080 | \n",
" Will LLC | \n",
" B1-53102 | \n",
" 45 | \n",
" 89.22 | \n",
" 4014.90 | \n",
" 2014-02-11 04:14:09 | \n",
"
\n",
" \n",
" 1213 | \n",
" 604255 | \n",
" Halvorson, Crona and Champlin | \n",
" B1-53102 | \n",
" 41 | \n",
" 55.05 | \n",
" 2257.05 | \n",
" 2014-10-18 19:27:01 | \n",
"
\n",
" \n",
" 1215 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" B1-53102 | \n",
" 41 | \n",
" 93.70 | \n",
" 3841.70 | \n",
" 2014-10-18 23:25:10 | \n",
"
\n",
" \n",
" 1128 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" B1-53102 | \n",
" 41 | \n",
" 55.68 | \n",
" 2282.88 | \n",
" 2014-09-27 10:42:48 | \n",
"
\n",
" \n",
" 1001 | \n",
" 424914 | \n",
" White-Trantow | \n",
" B1-53102 | \n",
" 41 | \n",
" 81.25 | \n",
" 3331.25 | \n",
" 2014-08-26 11:44:30 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"684 642753 Pollich LLC B1-53102 46 \n",
"792 688981 Keeling LLC B1-53102 45 \n",
"176 383080 Will LLC B1-53102 45 \n",
"1213 604255 Halvorson, Crona and Champlin B1-53102 41 \n",
"1215 307599 Kassulke, Ondricka and Metz B1-53102 41 \n",
"1128 714466 Trantow-Barrows B1-53102 41 \n",
"1001 424914 White-Trantow B1-53102 41 \n",
"\n",
" unit price ext price date \n",
"684 26.07 1199.22 2014-06-08 19:33:33 \n",
"792 41.19 1853.55 2014-07-04 21:42:22 \n",
"176 89.22 4014.90 2014-02-11 04:14:09 \n",
"1213 55.05 2257.05 2014-10-18 19:27:01 \n",
"1215 93.70 3841.70 2014-10-18 23:25:10 \n",
"1128 55.68 2282.88 2014-09-27 10:42:48 \n",
"1001 81.25 3331.25 2014-08-26 11:44:30 "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['sku'].str.contains('B1-531')) & (df['quantity']>40)].sort_values(by=['quantity','name'],ascending=[0,1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bonus Task"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I frequently find myself trying to get a list of unique items in a long list within Excel. It is a multi-step process to do this in Excel but is fairly simple in pandas. We just use the `unique` function on a column to get the list."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Barton LLC', 'Trantow-Barrows', 'Kulas Inc',\n",
" 'Kassulke, Ondricka and Metz', 'Jerde-Hilpert', 'Koepp Ltd',\n",
" 'Fritsch, Russel and Anderson', 'Kiehn-Spinka', 'Keeling LLC',\n",
" 'Frami, Hills and Schmidt', 'Stokes LLC', 'Kuhn-Gusikowski',\n",
" 'Herman LLC', 'White-Trantow', 'Sanford and Sons', 'Pollich LLC',\n",
" 'Will LLC', 'Cronin, Oberbrunner and Spencer',\n",
" 'Halvorson, Crona and Champlin', 'Purdy-Kunde'], dtype=object)"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"name\"].unique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we wanted to include the account number, we could use `drop_duplicates`."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
" sku | \n",
" quantity | \n",
" unit price | \n",
" ext price | \n",
" date | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 740150 | \n",
" Barton LLC | \n",
" B1-20000 | \n",
" 39 | \n",
" 86.69 | \n",
" 3380.91 | \n",
" 2014-01-01 07:21:51 | \n",
"
\n",
" \n",
" 1 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
" S2-77896 | \n",
" -1 | \n",
" 63.16 | \n",
" -63.16 | \n",
" 2014-01-01 10:00:47 | \n",
"
\n",
" \n",
" 2 | \n",
" 218895 | \n",
" Kulas Inc | \n",
" B1-69924 | \n",
" 23 | \n",
" 90.70 | \n",
" 2086.10 | \n",
" 2014-01-01 13:24:58 | \n",
"
\n",
" \n",
" 3 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
" S1-65481 | \n",
" 41 | \n",
" 21.05 | \n",
" 863.05 | \n",
" 2014-01-01 15:05:22 | \n",
"
\n",
" \n",
" 4 | \n",
" 412290 | \n",
" Jerde-Hilpert | \n",
" S2-34077 | \n",
" 6 | \n",
" 83.21 | \n",
" 499.26 | \n",
" 2014-01-01 23:26:55 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name sku quantity \\\n",
"0 740150 Barton LLC B1-20000 39 \n",
"1 714466 Trantow-Barrows S2-77896 -1 \n",
"2 218895 Kulas Inc B1-69924 23 \n",
"3 307599 Kassulke, Ondricka and Metz S1-65481 41 \n",
"4 412290 Jerde-Hilpert S2-34077 6 \n",
"\n",
" unit price ext price date \n",
"0 86.69 3380.91 2014-01-01 07:21:51 \n",
"1 63.16 -63.16 2014-01-01 10:00:47 \n",
"2 90.70 2086.10 2014-01-01 13:24:58 \n",
"3 21.05 863.05 2014-01-01 15:05:22 \n",
"4 83.21 499.26 2014-01-01 23:26:55 "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop_duplicates(subset=[\"account number\",\"name\"]).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are obviously pulling in more data than we need and getting some non-useful information, so select only the first and second columns using `ix`."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" account number | \n",
" name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 740150 | \n",
" Barton LLC | \n",
"
\n",
" \n",
" 1 | \n",
" 714466 | \n",
" Trantow-Barrows | \n",
"
\n",
" \n",
" 2 | \n",
" 218895 | \n",
" Kulas Inc | \n",
"
\n",
" \n",
" 3 | \n",
" 307599 | \n",
" Kassulke, Ondricka and Metz | \n",
"
\n",
" \n",
" 4 | \n",
" 412290 | \n",
" Jerde-Hilpert | \n",
"
\n",
" \n",
" 7 | \n",
" 729833 | \n",
" Koepp Ltd | \n",
"
\n",
" \n",
" 9 | \n",
" 737550 | \n",
" Fritsch, Russel and Anderson | \n",
"
\n",
" \n",
" 10 | \n",
" 146832 | \n",
" Kiehn-Spinka | \n",
"
\n",
" \n",
" 11 | \n",
" 688981 | \n",
" Keeling LLC | \n",
"
\n",
" \n",
" 12 | \n",
" 786968 | \n",
" Frami, Hills and Schmidt | \n",
"
\n",
" \n",
" 15 | \n",
" 239344 | \n",
" Stokes LLC | \n",
"
\n",
" \n",
" 16 | \n",
" 672390 | \n",
" Kuhn-Gusikowski | \n",
"
\n",
" \n",
" 18 | \n",
" 141962 | \n",
" Herman LLC | \n",
"
\n",
" \n",
" 20 | \n",
" 424914 | \n",
" White-Trantow | \n",
"
\n",
" \n",
" 21 | \n",
" 527099 | \n",
" Sanford and Sons | \n",
"
\n",
" \n",
" 30 | \n",
" 642753 | \n",
" Pollich LLC | \n",
"
\n",
" \n",
" 37 | \n",
" 383080 | \n",
" Will LLC | \n",
"
\n",
" \n",
" 51 | \n",
" 257198 | \n",
" Cronin, Oberbrunner and Spencer | \n",
"
\n",
" \n",
" 67 | \n",
" 604255 | \n",
" Halvorson, Crona and Champlin | \n",
"
\n",
" \n",
" 106 | \n",
" 163416 | \n",
" Purdy-Kunde | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" account number name\n",
"0 740150 Barton LLC\n",
"1 714466 Trantow-Barrows\n",
"2 218895 Kulas Inc\n",
"3 307599 Kassulke, Ondricka and Metz\n",
"4 412290 Jerde-Hilpert\n",
"7 729833 Koepp Ltd\n",
"9 737550 Fritsch, Russel and Anderson\n",
"10 146832 Kiehn-Spinka\n",
"11 688981 Keeling LLC\n",
"12 786968 Frami, Hills and Schmidt\n",
"15 239344 Stokes LLC\n",
"16 672390 Kuhn-Gusikowski\n",
"18 141962 Herman LLC\n",
"20 424914 White-Trantow\n",
"21 527099 Sanford and Sons\n",
"30 642753 Pollich LLC\n",
"37 383080 Will LLC\n",
"51 257198 Cronin, Oberbrunner and Spencer\n",
"67 604255 Halvorson, Crona and Champlin\n",
"106 163416 Purdy-Kunde"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop_duplicates(subset=[\"account number\",\"name\"]).iloc[:,[0,1]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I hope you found this useful. I encourage you to try and apply these ideas to some of your own repetitive Excel tasks and streamline your work flow."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}