{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Market Basket Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Association Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get pandas and MLxtend code imported and read the data:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"#imports\n",
"import pandas as pd #Python data analysis library\n",
"import numpy as np #Python scientific computing\n",
"from mlxtend.frequent_patterns import apriori\n",
"from mlxtend.frequent_patterns import association_rules"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Citation - “The Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017\""
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"#import dataset\n",
"trainDf = pd.read_csv(\"../data/raw/order_products__train.csv\")\n",
"orderDf = pd.read_csv(\"../data/raw/orders.csv\")\n",
"productDf = pd.read_csv(\"../data/raw/products.csv\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will merge the prior and train dataset to get the complete order dataset. We will use append() for doing so."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"priorDf = pd.read_csv(\"../data/raw/order_products__prior.csv\")\n",
"trainDf = trainDf.append(priorDf,ignore_index = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For counting each product, we can assign reordered column as 1 "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"trainDf['reordered'] = 1 "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"productCountDf = trainDf.groupby(\"product_id\",as_index = False)[\"order_id\"].count()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"#Top 100 most frequently purchased products\n",
"topLev = 100\n",
"\n",
"#Here order_id is the count so we need to sort the data frame w.r.t order_id\n",
"productCountDf = productCountDf.sort_values(\"order_id\",ascending = False)\n",
"\n",
"topProdFrame = productCountDf.iloc[0:topLev,:]\n",
"topProdFrame = topProdFrame.merge(productDf,on = \"product_id\")\n",
"productId= topProdFrame.loc[:,[\"product_id\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will filter the orders and get orders containting the the most frequently purchased products"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"df = trainDf[0:0]\n",
"for i in range(0,99):\n",
" pId = productId.iloc[i]['product_id'] \n",
" stDf = trainDf[trainDf.product_id == pId ]\n",
" df = df.append(stDf,ignore_index = False)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" order_id | \n",
" product_id | \n",
" add_to_cart_order | \n",
" reordered | \n",
"
\n",
" \n",
" \n",
" \n",
" 115 | \n",
" 226 | \n",
" 24852 | \n",
" 2 | \n",
" 1 | \n",
"
\n",
" \n",
" 156 | \n",
" 473 | \n",
" 24852 | \n",
" 2 | \n",
" 1 | \n",
"
\n",
" \n",
" 196 | \n",
" 878 | \n",
" 24852 | \n",
" 2 | \n",
" 1 | \n",
"
\n",
" \n",
" 272 | \n",
" 1042 | \n",
" 24852 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 297 | \n",
" 1139 | \n",
" 24852 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" order_id product_id add_to_cart_order reordered\n",
"115 226 24852 2 1\n",
"156 473 24852 2 1\n",
"196 878 24852 2 1\n",
"272 1042 24852 1 1\n",
"297 1139 24852 1 1"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Now we need to consolidate the items into 1 transaction per row with each product 1 hot encoded. Each row will represent an order and each column will represent product_id. If the cell value is '1' say (i,j) then ith order contains jth product."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"basket = df.groupby(['order_id', 'product_id'])['reordered'].sum().unstack().reset_index().fillna(0).set_index('order_id')"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Convert the units to 1 hot encoded values\n",
"def encode_units(x):\n",
" if x <= 0:\n",
" return 0\n",
" if x >= 1:\n",
" return 1 "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"basket_sets = basket.applymap(encode_units)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" product_id | \n",
" 196 | \n",
" 3957 | \n",
" 4210 | \n",
" 4605 | \n",
" 4799 | \n",
" 4920 | \n",
" 5077 | \n",
" 5450 | \n",
" 5785 | \n",
" 5876 | \n",
" ... | \n",
" 46667 | \n",
" 46906 | \n",
" 46979 | \n",
" 47144 | \n",
" 47209 | \n",
" 47626 | \n",
" 47766 | \n",
" 48679 | \n",
" 49235 | \n",
" 49683 | \n",
"
\n",
" \n",
" order_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 5 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 9 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" ... | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 99 columns
\n",
"
"
],
"text/plain": [
"product_id 196 3957 4210 4605 4799 4920 5077 5450 5785 \\\n",
"order_id \n",
"1 0 0 0 0 0 0 0 0 0 \n",
"2 0 0 0 0 0 0 0 0 0 \n",
"3 0 0 0 0 0 0 0 0 0 \n",
"5 0 0 0 0 0 0 0 0 0 \n",
"9 0 0 0 0 0 0 0 0 0 \n",
"\n",
"product_id 5876 ... 46667 46906 46979 47144 47209 47626 47766 \\\n",
"order_id ... \n",
"1 0 ... 0 0 0 0 1 0 0 \n",
"2 0 ... 0 0 0 0 0 0 0 \n",
"3 0 ... 1 0 0 0 0 0 0 \n",
"5 0 ... 0 0 0 0 1 0 0 \n",
"9 0 ... 0 0 0 0 0 0 0 \n",
"\n",
"product_id 48679 49235 49683 \n",
"order_id \n",
"1 0 0 1 \n",
"2 0 0 0 \n",
"3 0 0 0 \n",
"5 0 0 0 \n",
"9 0 0 0 \n",
"\n",
"[5 rows x 99 columns]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"basket_sets.head()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"241667217"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"basket_sets.size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that the data is structured properly, we can generate frequent item sets that have a support of at least 1% "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"# Build up the frequent items\n",
"frequent_itemsets = apriori(basket_sets, min_support=0.01, use_colnames=True)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" support | \n",
" itemsets | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.015279 | \n",
" [196] | \n",
"
\n",
" \n",
" 1 | \n",
" 0.016088 | \n",
" [3957] | \n",
"
\n",
" \n",
" 2 | \n",
" 0.015144 | \n",
" [4210] | \n",
"
\n",
" \n",
" 3 | \n",
" 0.031514 | \n",
" [4605] | \n",
"
\n",
" \n",
" 4 | \n",
" 0.015439 | \n",
" [4799] | \n",
"
\n",
" \n",
" 5 | \n",
" 0.035537 | \n",
" [4920] | \n",
"
\n",
" \n",
" 6 | \n",
" 0.025855 | \n",
" [5077] | \n",
"
\n",
" \n",
" 7 | \n",
" 0.021353 | \n",
" [5450] | \n",
"
\n",
" \n",
" 8 | \n",
" 0.020226 | \n",
" [5785] | \n",
"
\n",
" \n",
" 9 | \n",
" 0.037381 | \n",
" [5876] | \n",
"
\n",
" \n",
" 10 | \n",
" 0.014564 | \n",
" [7781] | \n",
"
\n",
" \n",
" 11 | \n",
" 0.018312 | \n",
" [8174] | \n",
"
\n",
" \n",
" 12 | \n",
" 0.035751 | \n",
" [8277] | \n",
"
\n",
" \n",
" 13 | \n",
" 0.018425 | \n",
" [8424] | \n",
"
\n",
" \n",
" 14 | \n",
" 0.029005 | \n",
" [8518] | \n",
"
\n",
" \n",
" 15 | \n",
" 0.023870 | \n",
" [9076] | \n",
"
\n",
" \n",
" 16 | \n",
" 0.015389 | \n",
" [9387] | \n",
"
\n",
" \n",
" 17 | \n",
" 0.016539 | \n",
" [9839] | \n",
"
\n",
" \n",
" 18 | \n",
" 0.024529 | \n",
" [10749] | \n",
"
\n",
" \n",
" 19 | \n",
" 0.017318 | \n",
" [11520] | \n",
"
\n",
" \n",
" 20 | \n",
" 0.021506 | \n",
" [12341] | \n",
"
\n",
" \n",
" 21 | \n",
" 0.161785 | \n",
" [13176] | \n",
"
\n",
" \n",
" 22 | \n",
" 0.016817 | \n",
" [15290] | \n",
"
\n",
" \n",
" 23 | \n",
" 0.015387 | \n",
" [16759] | \n",
"
\n",
" \n",
" 24 | \n",
" 0.061221 | \n",
" [16797] | \n",
"
\n",
" \n",
" 25 | \n",
" 0.030820 | \n",
" [17794] | \n",
"
\n",
" \n",
" 26 | \n",
" 0.017128 | \n",
" [18465] | \n",
"
\n",
" \n",
" 27 | \n",
" 0.031976 | \n",
" [19057] | \n",
"
\n",
" \n",
" 28 | \n",
" 0.023888 | \n",
" [19660] | \n",
"
\n",
" \n",
" 29 | \n",
" 0.018213 | \n",
" [20114] | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 98 | \n",
" 0.040854 | \n",
" [49683] | \n",
"
\n",
" \n",
" 99 | \n",
" 0.010551 | \n",
" [4920, 24852] | \n",
"
\n",
" \n",
" 100 | \n",
" 0.026505 | \n",
" [13176, 21137] | \n",
"
\n",
" \n",
" 101 | \n",
" 0.021551 | \n",
" [13176, 21903] | \n",
"
\n",
" \n",
" 102 | \n",
" 0.010476 | \n",
" [13176, 22935] | \n",
"
\n",
" \n",
" 103 | \n",
" 0.011306 | \n",
" [13176, 27845] | \n",
"
\n",
" \n",
" 104 | \n",
" 0.017321 | \n",
" [13176, 27966] | \n",
"
\n",
" \n",
" 105 | \n",
" 0.026530 | \n",
" [13176, 47209] | \n",
"
\n",
" \n",
" 106 | \n",
" 0.017689 | \n",
" [16797, 24852] | \n",
"
\n",
" \n",
" 107 | \n",
" 0.016293 | \n",
" [21137, 21903] | \n",
"
\n",
" \n",
" 108 | \n",
" 0.023895 | \n",
" [21137, 24852] | \n",
"
\n",
" \n",
" 109 | \n",
" 0.010146 | \n",
" [21137, 27845] | \n",
"
\n",
" \n",
" 110 | \n",
" 0.014556 | \n",
" [21137, 27966] | \n",
"
\n",
" \n",
" 111 | \n",
" 0.010252 | \n",
" [21137, 39275] | \n",
"
\n",
" \n",
" 112 | \n",
" 0.017342 | \n",
" [21137, 47209] | \n",
"
\n",
" \n",
" 113 | \n",
" 0.010271 | \n",
" [21137, 47766] | \n",
"
\n",
" \n",
" 114 | \n",
" 0.021873 | \n",
" [21903, 24852] | \n",
"
\n",
" \n",
" 115 | \n",
" 0.014810 | \n",
" [21903, 47209] | \n",
"
\n",
" \n",
" 116 | \n",
" 0.013228 | \n",
" [21903, 47766] | \n",
"
\n",
" \n",
" 117 | \n",
" 0.013561 | \n",
" [24852, 26209] | \n",
"
\n",
" \n",
" 118 | \n",
" 0.013390 | \n",
" [24852, 27845] | \n",
"
\n",
" \n",
" 119 | \n",
" 0.014401 | \n",
" [24852, 28204] | \n",
"
\n",
" \n",
" 120 | \n",
" 0.012142 | \n",
" [24852, 45066] | \n",
"
\n",
" \n",
" 121 | \n",
" 0.013178 | \n",
" [24852, 47209] | \n",
"
\n",
" \n",
" 122 | \n",
" 0.017631 | \n",
" [24852, 47626] | \n",
"
\n",
" \n",
" 123 | \n",
" 0.022781 | \n",
" [24852, 47766] | \n",
"
\n",
" \n",
" 124 | \n",
" 0.013453 | \n",
" [24852, 49683] | \n",
"
\n",
" \n",
" 125 | \n",
" 0.011879 | \n",
" [26209, 47626] | \n",
"
\n",
" \n",
" 126 | \n",
" 0.010984 | \n",
" [27966, 47209] | \n",
"
\n",
" \n",
" 127 | \n",
" 0.010555 | \n",
" [47626, 47766] | \n",
"
\n",
" \n",
"
\n",
"
128 rows × 2 columns
\n",
"
"
],
"text/plain": [
" support itemsets\n",
"0 0.015279 [196]\n",
"1 0.016088 [3957]\n",
"2 0.015144 [4210]\n",
"3 0.031514 [4605]\n",
"4 0.015439 [4799]\n",
"5 0.035537 [4920]\n",
"6 0.025855 [5077]\n",
"7 0.021353 [5450]\n",
"8 0.020226 [5785]\n",
"9 0.037381 [5876]\n",
"10 0.014564 [7781]\n",
"11 0.018312 [8174]\n",
"12 0.035751 [8277]\n",
"13 0.018425 [8424]\n",
"14 0.029005 [8518]\n",
"15 0.023870 [9076]\n",
"16 0.015389 [9387]\n",
"17 0.016539 [9839]\n",
"18 0.024529 [10749]\n",
"19 0.017318 [11520]\n",
"20 0.021506 [12341]\n",
"21 0.161785 [13176]\n",
"22 0.016817 [15290]\n",
"23 0.015387 [16759]\n",
"24 0.061221 [16797]\n",
"25 0.030820 [17794]\n",
"26 0.017128 [18465]\n",
"27 0.031976 [19057]\n",
"28 0.023888 [19660]\n",
"29 0.018213 [20114]\n",
".. ... ...\n",
"98 0.040854 [49683]\n",
"99 0.010551 [4920, 24852]\n",
"100 0.026505 [13176, 21137]\n",
"101 0.021551 [13176, 21903]\n",
"102 0.010476 [13176, 22935]\n",
"103 0.011306 [13176, 27845]\n",
"104 0.017321 [13176, 27966]\n",
"105 0.026530 [13176, 47209]\n",
"106 0.017689 [16797, 24852]\n",
"107 0.016293 [21137, 21903]\n",
"108 0.023895 [21137, 24852]\n",
"109 0.010146 [21137, 27845]\n",
"110 0.014556 [21137, 27966]\n",
"111 0.010252 [21137, 39275]\n",
"112 0.017342 [21137, 47209]\n",
"113 0.010271 [21137, 47766]\n",
"114 0.021873 [21903, 24852]\n",
"115 0.014810 [21903, 47209]\n",
"116 0.013228 [21903, 47766]\n",
"117 0.013561 [24852, 26209]\n",
"118 0.013390 [24852, 27845]\n",
"119 0.014401 [24852, 28204]\n",
"120 0.012142 [24852, 45066]\n",
"121 0.013178 [24852, 47209]\n",
"122 0.017631 [24852, 47626]\n",
"123 0.022781 [24852, 47766]\n",
"124 0.013453 [24852, 49683]\n",
"125 0.011879 [26209, 47626]\n",
"126 0.010984 [27966, 47209]\n",
"127 0.010555 [47626, 47766]\n",
"\n",
"[128 rows x 2 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"frequent_itemsets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The final step is to generate the rules with their corresponding support, confidence and lift:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" antecedants | \n",
" consequents | \n",
" support | \n",
" confidence | \n",
" lift | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" (26209) | \n",
" (47626) | \n",
" 0.060080 | \n",
" 0.197723 | \n",
" 3.001750 | \n",
"
\n",
" \n",
" 1 | \n",
" (47626) | \n",
" (26209) | \n",
" 0.065869 | \n",
" 0.180345 | \n",
" 3.001750 | \n",
"
\n",
" \n",
" 2 | \n",
" (47766) | \n",
" (21903) | \n",
" 0.075468 | \n",
" 0.175281 | \n",
" 1.699910 | \n",
"
\n",
" \n",
" 3 | \n",
" (21903) | \n",
" (47766) | \n",
" 0.103112 | \n",
" 0.128289 | \n",
" 1.699910 | \n",
"
\n",
" \n",
" 4 | \n",
" (47209) | \n",
" (27966) | \n",
" 0.090483 | \n",
" 0.121389 | \n",
" 2.077938 | \n",
"
\n",
" \n",
" 5 | \n",
" (27966) | \n",
" (47209) | \n",
" 0.058418 | \n",
" 0.188018 | \n",
" 2.077938 | \n",
"
\n",
" \n",
" 6 | \n",
" (24852) | \n",
" (21903) | \n",
" 0.201259 | \n",
" 0.108683 | \n",
" 1.054029 | \n",
"
\n",
" \n",
" 7 | \n",
" (21903) | \n",
" (24852) | \n",
" 0.103112 | \n",
" 0.212133 | \n",
" 1.054029 | \n",
"
\n",
" \n",
" 8 | \n",
" (13176) | \n",
" (21903) | \n",
" 0.161785 | \n",
" 0.133208 | \n",
" 1.291881 | \n",
"
\n",
" \n",
" 9 | \n",
" (21903) | \n",
" (13176) | \n",
" 0.103112 | \n",
" 0.209007 | \n",
" 1.291881 | \n",
"
\n",
" \n",
" 10 | \n",
" (13176) | \n",
" (27845) | \n",
" 0.161785 | \n",
" 0.069883 | \n",
" 1.194505 | \n",
"
\n",
" \n",
" 11 | \n",
" (27845) | \n",
" (13176) | \n",
" 0.058504 | \n",
" 0.193253 | \n",
" 1.194505 | \n",
"
\n",
" \n",
" 12 | \n",
" (26209) | \n",
" (24852) | \n",
" 0.060080 | \n",
" 0.225713 | \n",
" 1.121500 | \n",
"
\n",
" \n",
" 13 | \n",
" (24852) | \n",
" (26209) | \n",
" 0.201259 | \n",
" 0.067380 | \n",
" 1.121500 | \n",
"
\n",
" \n",
" 14 | \n",
" (24852) | \n",
" (16797) | \n",
" 0.201259 | \n",
" 0.087891 | \n",
" 1.435638 | \n",
"
\n",
" \n",
" 15 | \n",
" (16797) | \n",
" (24852) | \n",
" 0.061221 | \n",
" 0.288936 | \n",
" 1.435638 | \n",
"
\n",
" \n",
" 16 | \n",
" (21137) | \n",
" (47209) | \n",
" 0.112891 | \n",
" 0.153616 | \n",
" 1.697728 | \n",
"
\n",
" \n",
" 17 | \n",
" (47209) | \n",
" (21137) | \n",
" 0.090483 | \n",
" 0.191659 | \n",
" 1.697728 | \n",
"
\n",
" \n",
" 18 | \n",
" (49683) | \n",
" (24852) | \n",
" 0.040854 | \n",
" 0.329296 | \n",
" 1.636175 | \n",
"
\n",
" \n",
" 19 | \n",
" (24852) | \n",
" (49683) | \n",
" 0.201259 | \n",
" 0.066844 | \n",
" 1.636175 | \n",
"
\n",
" \n",
" 20 | \n",
" (21137) | \n",
" (39275) | \n",
" 0.112891 | \n",
" 0.090809 | \n",
" 2.110653 | \n",
"
\n",
" \n",
" 21 | \n",
" (39275) | \n",
" (21137) | \n",
" 0.043024 | \n",
" 0.238274 | \n",
" 2.110653 | \n",
"
\n",
" \n",
" 22 | \n",
" (47626) | \n",
" (47766) | \n",
" 0.065869 | \n",
" 0.160244 | \n",
" 2.123337 | \n",
"
\n",
" \n",
" 23 | \n",
" (47766) | \n",
" (47626) | \n",
" 0.075468 | \n",
" 0.139862 | \n",
" 2.123337 | \n",
"
\n",
" \n",
" 24 | \n",
" (24852) | \n",
" (47766) | \n",
" 0.201259 | \n",
" 0.113194 | \n",
" 1.499886 | \n",
"
\n",
" \n",
" 25 | \n",
" (47766) | \n",
" (24852) | \n",
" 0.075468 | \n",
" 0.301866 | \n",
" 1.499886 | \n",
"
\n",
" \n",
" 26 | \n",
" (13176) | \n",
" (27966) | \n",
" 0.161785 | \n",
" 0.107065 | \n",
" 1.832734 | \n",
"
\n",
" \n",
" 27 | \n",
" (27966) | \n",
" (13176) | \n",
" 0.058418 | \n",
" 0.296508 | \n",
" 1.832734 | \n",
"
\n",
" \n",
" 28 | \n",
" (21137) | \n",
" (27845) | \n",
" 0.112891 | \n",
" 0.089873 | \n",
" 1.536191 | \n",
"
\n",
" \n",
" 29 | \n",
" (27845) | \n",
" (21137) | \n",
" 0.058504 | \n",
" 0.173423 | \n",
" 1.536191 | \n",
"
\n",
" \n",
" 30 | \n",
" (13176) | \n",
" (47209) | \n",
" 0.161785 | \n",
" 0.163981 | \n",
" 1.812281 | \n",
"
\n",
" \n",
" 31 | \n",
" (47209) | \n",
" (13176) | \n",
" 0.090483 | \n",
" 0.293199 | \n",
" 1.812281 | \n",
"
\n",
" \n",
" 32 | \n",
" (47209) | \n",
" (21903) | \n",
" 0.090483 | \n",
" 0.163679 | \n",
" 1.587393 | \n",
"
\n",
" \n",
" 33 | \n",
" (21903) | \n",
" (47209) | \n",
" 0.103112 | \n",
" 0.143632 | \n",
" 1.587393 | \n",
"
\n",
" \n",
" 34 | \n",
" (45066) | \n",
" (24852) | \n",
" 0.034132 | \n",
" 0.355725 | \n",
" 1.767494 | \n",
"
\n",
" \n",
" 35 | \n",
" (24852) | \n",
" (45066) | \n",
" 0.201259 | \n",
" 0.060329 | \n",
" 1.767494 | \n",
"
\n",
" \n",
" 36 | \n",
" (21137) | \n",
" (21903) | \n",
" 0.112891 | \n",
" 0.144326 | \n",
" 1.399704 | \n",
"
\n",
" \n",
" 37 | \n",
" (21903) | \n",
" (21137) | \n",
" 0.103112 | \n",
" 0.158014 | \n",
" 1.399704 | \n",
"
\n",
" \n",
" 38 | \n",
" (24852) | \n",
" (27845) | \n",
" 0.201259 | \n",
" 0.066529 | \n",
" 1.137168 | \n",
"
\n",
" \n",
" 39 | \n",
" (27845) | \n",
" (24852) | \n",
" 0.058504 | \n",
" 0.228866 | \n",
" 1.137168 | \n",
"
\n",
" \n",
" 40 | \n",
" (28204) | \n",
" (24852) | \n",
" 0.038052 | \n",
" 0.378441 | \n",
" 1.880364 | \n",
"
\n",
" \n",
" 41 | \n",
" (24852) | \n",
" (28204) | \n",
" 0.201259 | \n",
" 0.071552 | \n",
" 1.880364 | \n",
"
\n",
" \n",
" 42 | \n",
" (21137) | \n",
" (27966) | \n",
" 0.112891 | \n",
" 0.128940 | \n",
" 2.207206 | \n",
"
\n",
" \n",
" 43 | \n",
" (27966) | \n",
" (21137) | \n",
" 0.058418 | \n",
" 0.249174 | \n",
" 2.207206 | \n",
"
\n",
" \n",
" 44 | \n",
" (4920) | \n",
" (24852) | \n",
" 0.035537 | \n",
" 0.296906 | \n",
" 1.475240 | \n",
"
\n",
" \n",
" 45 | \n",
" (24852) | \n",
" (4920) | \n",
" 0.201259 | \n",
" 0.052425 | \n",
" 1.475240 | \n",
"
\n",
" \n",
" 46 | \n",
" (21137) | \n",
" (47766) | \n",
" 0.112891 | \n",
" 0.090980 | \n",
" 1.205542 | \n",
"
\n",
" \n",
" 47 | \n",
" (47766) | \n",
" (21137) | \n",
" 0.075468 | \n",
" 0.136095 | \n",
" 1.205542 | \n",
"
\n",
" \n",
" 48 | \n",
" (47626) | \n",
" (24852) | \n",
" 0.065869 | \n",
" 0.267663 | \n",
" 1.329938 | \n",
"
\n",
" \n",
" 49 | \n",
" (24852) | \n",
" (47626) | \n",
" 0.201259 | \n",
" 0.087602 | \n",
" 1.329938 | \n",
"
\n",
" \n",
" 50 | \n",
" (13176) | \n",
" (22935) | \n",
" 0.161785 | \n",
" 0.064756 | \n",
" 1.342844 | \n",
"
\n",
" \n",
" 51 | \n",
" (22935) | \n",
" (13176) | \n",
" 0.048223 | \n",
" 0.217252 | \n",
" 1.342844 | \n",
"
\n",
" \n",
" 52 | \n",
" (13176) | \n",
" (21137) | \n",
" 0.161785 | \n",
" 0.163832 | \n",
" 1.451233 | \n",
"
\n",
" \n",
" 53 | \n",
" (21137) | \n",
" (13176) | \n",
" 0.112891 | \n",
" 0.234787 | \n",
" 1.451233 | \n",
"
\n",
" \n",
" 54 | \n",
" (21137) | \n",
" (24852) | \n",
" 0.112891 | \n",
" 0.211665 | \n",
" 1.051702 | \n",
"
\n",
" \n",
" 55 | \n",
" (24852) | \n",
" (21137) | \n",
" 0.201259 | \n",
" 0.118728 | \n",
" 1.051702 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" antecedants consequents support confidence lift\n",
"0 (26209) (47626) 0.060080 0.197723 3.001750\n",
"1 (47626) (26209) 0.065869 0.180345 3.001750\n",
"2 (47766) (21903) 0.075468 0.175281 1.699910\n",
"3 (21903) (47766) 0.103112 0.128289 1.699910\n",
"4 (47209) (27966) 0.090483 0.121389 2.077938\n",
"5 (27966) (47209) 0.058418 0.188018 2.077938\n",
"6 (24852) (21903) 0.201259 0.108683 1.054029\n",
"7 (21903) (24852) 0.103112 0.212133 1.054029\n",
"8 (13176) (21903) 0.161785 0.133208 1.291881\n",
"9 (21903) (13176) 0.103112 0.209007 1.291881\n",
"10 (13176) (27845) 0.161785 0.069883 1.194505\n",
"11 (27845) (13176) 0.058504 0.193253 1.194505\n",
"12 (26209) (24852) 0.060080 0.225713 1.121500\n",
"13 (24852) (26209) 0.201259 0.067380 1.121500\n",
"14 (24852) (16797) 0.201259 0.087891 1.435638\n",
"15 (16797) (24852) 0.061221 0.288936 1.435638\n",
"16 (21137) (47209) 0.112891 0.153616 1.697728\n",
"17 (47209) (21137) 0.090483 0.191659 1.697728\n",
"18 (49683) (24852) 0.040854 0.329296 1.636175\n",
"19 (24852) (49683) 0.201259 0.066844 1.636175\n",
"20 (21137) (39275) 0.112891 0.090809 2.110653\n",
"21 (39275) (21137) 0.043024 0.238274 2.110653\n",
"22 (47626) (47766) 0.065869 0.160244 2.123337\n",
"23 (47766) (47626) 0.075468 0.139862 2.123337\n",
"24 (24852) (47766) 0.201259 0.113194 1.499886\n",
"25 (47766) (24852) 0.075468 0.301866 1.499886\n",
"26 (13176) (27966) 0.161785 0.107065 1.832734\n",
"27 (27966) (13176) 0.058418 0.296508 1.832734\n",
"28 (21137) (27845) 0.112891 0.089873 1.536191\n",
"29 (27845) (21137) 0.058504 0.173423 1.536191\n",
"30 (13176) (47209) 0.161785 0.163981 1.812281\n",
"31 (47209) (13176) 0.090483 0.293199 1.812281\n",
"32 (47209) (21903) 0.090483 0.163679 1.587393\n",
"33 (21903) (47209) 0.103112 0.143632 1.587393\n",
"34 (45066) (24852) 0.034132 0.355725 1.767494\n",
"35 (24852) (45066) 0.201259 0.060329 1.767494\n",
"36 (21137) (21903) 0.112891 0.144326 1.399704\n",
"37 (21903) (21137) 0.103112 0.158014 1.399704\n",
"38 (24852) (27845) 0.201259 0.066529 1.137168\n",
"39 (27845) (24852) 0.058504 0.228866 1.137168\n",
"40 (28204) (24852) 0.038052 0.378441 1.880364\n",
"41 (24852) (28204) 0.201259 0.071552 1.880364\n",
"42 (21137) (27966) 0.112891 0.128940 2.207206\n",
"43 (27966) (21137) 0.058418 0.249174 2.207206\n",
"44 (4920) (24852) 0.035537 0.296906 1.475240\n",
"45 (24852) (4920) 0.201259 0.052425 1.475240\n",
"46 (21137) (47766) 0.112891 0.090980 1.205542\n",
"47 (47766) (21137) 0.075468 0.136095 1.205542\n",
"48 (47626) (24852) 0.065869 0.267663 1.329938\n",
"49 (24852) (47626) 0.201259 0.087602 1.329938\n",
"50 (13176) (22935) 0.161785 0.064756 1.342844\n",
"51 (22935) (13176) 0.048223 0.217252 1.342844\n",
"52 (13176) (21137) 0.161785 0.163832 1.451233\n",
"53 (21137) (13176) 0.112891 0.234787 1.451233\n",
"54 (21137) (24852) 0.112891 0.211665 1.051702\n",
"55 (24852) (21137) 0.201259 0.118728 1.051702"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Create the rules\n",
"rules = association_rules(frequent_itemsets, metric=\"lift\", min_threshold=1)\n",
"rules"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" We can also see several where the confidence is high as well.or instance, we can see that there are quite a few rules with a high lift value which means that it occurs more frequently than would be expected given the number of transaction and product combinations. \n",
"\n",
"We can filter the dataframe using standard pandas code. In this case, look for a large lift (2) and high confidence (.1):"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" antecedants | \n",
" consequents | \n",
" support | \n",
" confidence | \n",
" lift | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" (26209) | \n",
" (47626) | \n",
" 0.060080 | \n",
" 0.197723 | \n",
" 3.001750 | \n",
"
\n",
" \n",
" 1 | \n",
" (47626) | \n",
" (26209) | \n",
" 0.065869 | \n",
" 0.180345 | \n",
" 3.001750 | \n",
"
\n",
" \n",
" 4 | \n",
" (47209) | \n",
" (27966) | \n",
" 0.090483 | \n",
" 0.121389 | \n",
" 2.077938 | \n",
"
\n",
" \n",
" 5 | \n",
" (27966) | \n",
" (47209) | \n",
" 0.058418 | \n",
" 0.188018 | \n",
" 2.077938 | \n",
"
\n",
" \n",
" 21 | \n",
" (39275) | \n",
" (21137) | \n",
" 0.043024 | \n",
" 0.238274 | \n",
" 2.110653 | \n",
"
\n",
" \n",
" 22 | \n",
" (47626) | \n",
" (47766) | \n",
" 0.065869 | \n",
" 0.160244 | \n",
" 2.123337 | \n",
"
\n",
" \n",
" 23 | \n",
" (47766) | \n",
" (47626) | \n",
" 0.075468 | \n",
" 0.139862 | \n",
" 2.123337 | \n",
"
\n",
" \n",
" 42 | \n",
" (21137) | \n",
" (27966) | \n",
" 0.112891 | \n",
" 0.128940 | \n",
" 2.207206 | \n",
"
\n",
" \n",
" 43 | \n",
" (27966) | \n",
" (21137) | \n",
" 0.058418 | \n",
" 0.249174 | \n",
" 2.207206 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" antecedants consequents support confidence lift\n",
"0 (26209) (47626) 0.060080 0.197723 3.001750\n",
"1 (47626) (26209) 0.065869 0.180345 3.001750\n",
"4 (47209) (27966) 0.090483 0.121389 2.077938\n",
"5 (27966) (47209) 0.058418 0.188018 2.077938\n",
"21 (39275) (21137) 0.043024 0.238274 2.110653\n",
"22 (47626) (47766) 0.065869 0.160244 2.123337\n",
"23 (47766) (47626) 0.075468 0.139862 2.123337\n",
"42 (21137) (27966) 0.112891 0.128940 2.207206\n",
"43 (27966) (21137) 0.058418 0.249174 2.207206"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rules[ (rules['lift'] >= 2) &\n",
" (rules['confidence'] >= 0.1) ]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 2
}