{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Frequent Itemsets via Apriori Algorithm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Apriori function to extract frequent itemsets for association rule mining"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "> from mlxtend.frequent_patterns import apriori"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Overview"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Apriori is a popular algorithm [1] for extracting frequent itemsets with applications in association rule learning. The apriori algorithm has been designed to operate on databases containing transactions, such as purchases by customers of a store. A itemset is considered as \"frequent\" if it meets a user-specified support threshold. For instance, if the support threshold is set to 0.5 (50%), a frequent itemset is defined as a set of items that occur togehter in at least 50% of all transactions in the database."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## References\n",
    "\n",
    "[1] Agrawal, Rakesh, and Ramakrishnan Srikant. \"[Fast algorithms for mining association rules](https://www.it.uu.se/edu/course/homepage/infoutv/ht08/vldb94_rj.pdf).\" Proc. 20th int. conf. very large data bases, VLDB. Vol. 1215. 1994."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Example 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "source": [
    "The `apriori` function expects data in a one-hot encoded pandas DataFrame.\n",
    "Suppose we have the following transaction data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],\n",
    "           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],\n",
    "           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],\n",
    "           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],\n",
    "           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "We can transform it into the right format via the `OnehotTransactions` encoder as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Apple</th>\n",
       "      <th>Corn</th>\n",
       "      <th>Dill</th>\n",
       "      <th>Eggs</th>\n",
       "      <th>Ice cream</th>\n",
       "      <th>Kidney Beans</th>\n",
       "      <th>Milk</th>\n",
       "      <th>Nutmeg</th>\n",
       "      <th>Onion</th>\n",
       "      <th>Unicorn</th>\n",
       "      <th>Yogurt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Apple  Corn  Dill  Eggs  Ice cream  Kidney Beans  Milk  Nutmeg  Onion  \\\n",
       "0      0     0     0     1          0             1     1       1      1   \n",
       "1      0     0     1     1          0             1     0       1      1   \n",
       "2      1     0     0     1          0             1     1       0      0   \n",
       "3      0     1     0     0          0             1     1       0      0   \n",
       "4      0     1     0     1          1             1     0       0      1   \n",
       "\n",
       "   Unicorn  Yogurt  \n",
       "0        0       1  \n",
       "1        0       1  \n",
       "2        0       0  \n",
       "3        1       1  \n",
       "4        0       0  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from mlxtend.preprocessing import OnehotTransactions\n",
    "\n",
    "oht = OnehotTransactions()\n",
    "oht_ary = oht.fit(dataset).transform(dataset)\n",
    "df = pd.DataFrame(oht_ary, columns=oht.columns_)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Now, let us return the items and itemsets with at least 60% support:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.8</td>\n",
       "      <td>[3]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>[5]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[6]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[8]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[10]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.8</td>\n",
       "      <td>[3, 5]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[3, 8]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[5, 6]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[5, 8]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[5, 10]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[3, 5, 8]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    support   itemsets\n",
       "0       0.8        [3]\n",
       "1       1.0        [5]\n",
       "2       0.6        [6]\n",
       "3       0.6        [8]\n",
       "4       0.6       [10]\n",
       "5       0.8     [3, 5]\n",
       "6       0.6     [3, 8]\n",
       "7       0.6     [5, 6]\n",
       "8       0.6     [5, 8]\n",
       "9       0.6    [5, 10]\n",
       "10      0.6  [3, 5, 8]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from mlxtend.frequent_patterns import apriori\n",
    "\n",
    "apriori(df, min_support=0.6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "By default, `apriori` returns the column indices of the items, which may be useful in downstream operations such as association rule mining. For better readability, we can set `use_colnames=True` to convert these integer values into the respective item names: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.8</td>\n",
       "      <td>[Eggs]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>[Kidney Beans]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Milk]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Onion]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Yogurt]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.8</td>\n",
       "      <td>[Eggs, Kidney Beans]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Eggs, Onion]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Kidney Beans, Milk]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Kidney Beans, Onion]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Kidney Beans, Yogurt]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Eggs, Kidney Beans, Onion]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    support                     itemsets\n",
       "0       0.8                       [Eggs]\n",
       "1       1.0               [Kidney Beans]\n",
       "2       0.6                       [Milk]\n",
       "3       0.6                      [Onion]\n",
       "4       0.6                     [Yogurt]\n",
       "5       0.8         [Eggs, Kidney Beans]\n",
       "6       0.6                [Eggs, Onion]\n",
       "7       0.6         [Kidney Beans, Milk]\n",
       "8       0.6        [Kidney Beans, Onion]\n",
       "9       0.6       [Kidney Beans, Yogurt]\n",
       "10      0.6  [Eggs, Kidney Beans, Onion]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "apriori(df, min_support=0.6, use_colnames=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Example 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The advantage of working with pandas `DataFrames` is that we can use its convenient features to filter the results. For instance, let's assume we are only interested in itemsets of length 2 that have a support of at least 80 percent. First, we create the frequent itemsets via `apriori` and add a new column that stores the length of each itemset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "      <th>length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.8</td>\n",
       "      <td>[Eggs]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>[Kidney Beans]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Milk]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Onion]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Yogurt]</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.8</td>\n",
       "      <td>[Eggs, Kidney Beans]</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Eggs, Onion]</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Kidney Beans, Milk]</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Kidney Beans, Onion]</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Kidney Beans, Yogurt]</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.6</td>\n",
       "      <td>[Eggs, Kidney Beans, Onion]</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    support                     itemsets  length\n",
       "0       0.8                       [Eggs]       1\n",
       "1       1.0               [Kidney Beans]       1\n",
       "2       0.6                       [Milk]       1\n",
       "3       0.6                      [Onion]       1\n",
       "4       0.6                     [Yogurt]       1\n",
       "5       0.8         [Eggs, Kidney Beans]       2\n",
       "6       0.6                [Eggs, Onion]       2\n",
       "7       0.6         [Kidney Beans, Milk]       2\n",
       "8       0.6        [Kidney Beans, Onion]       2\n",
       "9       0.6       [Kidney Beans, Yogurt]       2\n",
       "10      0.6  [Eggs, Kidney Beans, Onion]       3"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)\n",
    "frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))\n",
    "frequent_itemsets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Then, we can select the results that satisfy our desired criteria as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>support</th>\n",
       "      <th>itemsets</th>\n",
       "      <th>length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.8</td>\n",
       "      <td>[Eggs, Kidney Beans]</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   support              itemsets  length\n",
       "5      0.8  [Eggs, Kidney Beans]       2"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frequent_itemsets[ (frequent_itemsets['length'] == 2) &\n",
    "                   (frequent_itemsets['support'] >= 0.8) ]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "## apriori\n",
      "\n",
      "*apriori(df, min_support=0.5, use_colnames=False)*\n",
      "\n",
      "Get frequent itemsets from a one-hot DataFrame\n",
      "**Parameters**\n",
      "\n",
      "- `df` : pandas DataFrame\n",
      "\n",
      "    pandas DataFrame in one-hot encoded format. For example\n",
      "    ```\n",
      "    Apple  Bananas  Beer  Chicken  Milk  Rice\n",
      "    0      1        0     1        1     0     1\n",
      "    1      1        0     1        0     0     1\n",
      "    2      1        0     1        0     0     0\n",
      "    3      1        1     0        0     0     0\n",
      "    4      0        0     1        1     1     1\n",
      "    5      0        0     1        0     1     1\n",
      "    6      0        0     1        0     1     0\n",
      "    7      1        1     0        0     0     0\n",
      "    ```\n",
      "\n",
      "- `min_support` : float (default: 0.5)\n",
      "\n",
      "    A float between 0 and 1 for minumum support of the itemsets returned.\n",
      "    The support is computed as the fraction\n",
      "    transactions_where_item(s)_occur / total_transactions.\n",
      "\n",
      "- `use_colnames` : bool (default: False)\n",
      "\n",
      "    If true, uses the DataFrames' column names in the returned DataFrame\n",
      "    instead of column indices.\n",
      "**Returns**\n",
      "\n",
      "pandas DataFrame with columns ['support', 'itemsets'] of all itemsets\n",
      "    that are >= min_support.\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "with open('../../api_modules/mlxtend.frequent_patterns/apriori.md', 'r') as f:\n",
    "    print(f.read())"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}