{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import pandas as pd\n", "\n", "import numpy as np\n", "from __future__ import division\n", "import itertools\n", "\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "plt.rcParams['axes.grid'] = False\n", "\n", "import logging\n", "logger = logging.getLogger()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6 Frequent Itemsets\n", "===========" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### 6.1 The Market-Basket Model\n", "Each **Basket** consists of a set of **items**(an itemset) \n", "\n", "+ The number of items in a basket is small.\n", "\n", "+ The number of baskets is usually very large.\n", "\n", "+ Basket are sets, and in priciple items can appear only once.\n", "\n", "\n", "##### Definition of Frequent Itemses\n", "a set of items that appears in many baskets is said to be \"frequent\".\n", "\n", "**support**: if ${I}$ is a set of items, the support of ${I}$ is the number of baskets for which I is a subset.\n", "\n", "Assume $s$ is the support threshold, then we say ${I}$ is frequent if its support is $s$ or more." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | training | \n", "a | \n", "and | \n", "cat | \n", "
---|---|---|---|---|
dog | \n", "4,6 | \n", "2,3,7 | \n", "1,2,7,8 | \n", "1,2,3,6,7 | \n", "
cat | \n", "5,6 | \n", "2,3,7 | \n", "1,2,5,7 | \n", "NaN | \n", "
and | \n", "5 | \n", "2,7 | \n", "NaN | \n", "NaN | \n", "
a | \n", "\n", " | NaN | \n", "NaN | \n", "NaN | \n", "