{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Machine Learning with ibmdbpy and Db2 Warehouse: Association Rules Mining \n",
    "\n",
    "This notebook explains how to perform association rules mining with the ibmdbpy Python package and data stored in Db2 Warehouse.\n",
    "___\n",
    "### Accelerate Python analytics with in-database processing by using ibmdbpy and IBM Db2 Warehouse\n",
    " \n",
    "The ibmdbpy project provides a Python interface for data manipulation and access to in-database algorithms in IBM Db2 Warehouse. It accelerates Python analytics by seamlessly pushing operations written in Python into the underlying database for execution, thereby benefitting from in-database performance-enhancing features, such as columnar storage and parallel processing. For more details about ibmdbpy, please refer to the [documentation](https://pythonhosted.org/ibmdbpy/index.html) and to the dedicated [Git repository](https://github.com/ibmdbanalytics/ibmdbpy/tree/master/ibmdbpy). This notebook provides you with an overview of ibmdbpy functionalities. \n",
    "\n",
    " \n",
    "__Prerequisites__\n",
    "* Db2 account: see [IBM Cloud](https://cloud.ibm.com/login) or [Db2 Warehouse](https://www.ibm.com/support/knowledgecenter/en/SSCJDQ/com.ibm.swg.im.dashdb.kc.doc/welcome.html)\n",
    "* Db2 driver: learn more on [IBM Knowledge Center](https://www.ibm.com/support/knowledgecenter/en/SSFMBX/com.ibm.swg.im.dashdb.doc/connecting/connect_applications_by_type.html) and see [IBM Support](https://www.ibm.com/support/pages/db2-jdbc-driver-versions-and-downloads)\n",
    "* Having installed the [ibmdbpy](https://pypi.org/project/ibmdbpy/) python library with pip: \n",
    "> pip install ibmdbpy \n",
    "* Optional dependency for JDBC is the [jaydebeapi](https://pypi.org/project/JayDeBeApi/) library. Run the following command to install ibmdbpy, as well as the dependencies for the JDBC feature:\n",
    "> pip install ibmdbpy[jdbc]\n",
    "* Sample data: follow this [steps](https://www.ibm.com/support/knowledgecenter/SSCJDQ/com.ibm.swg.im.dashdb.doc/admin/sampledata.html ) to download sample data for Db2.  \n",
    "\n",
    "__Association rules mining__\n",
    "\n",
    "Association rules mining is a common technique to better know your customers' habits. It answers the question: what items are commonly bought together? As it enables you to more efficiently analyze the shopping baskets of your customers, it also gives you hints about which promotions or actions to propose. For example, if flour, sugar and couverture chocolate are frequently bought together, discounting couverture chocolate might boost the sales of flour and sugar, because they are the basic ingredients you need for a tasty cake with a nice chocolate topping. You can find more details about associations rules mining and its support in Db2 Warehouse in the [IBM Knowledge Center](https://www.ibm.com/support/knowledgecenter/SS6NHC/com.ibm.swg.im.dashdb.analytics.doc/doc/r_association_rules.html). \n",
    "\n",
    "__Contents__\n",
    "\n",
    "This notebook uses the transactions dataset of an outdoor store. Let's see what patterns we find!\n",
    "\n",
    "__1. Get started__\n",
    "* Import what you need\n",
    "* Establish a JDBC connection to Db2 Warehouse\n",
    "* Explore the data\n",
    "    \n",
    "__2. Mine Association Rules with ibmdbpy__\n",
    "* Reminder about Association Rules Mining\n",
    "    * General task\n",
    "    * Definitions\n",
    "* Put your knowledge into practice\n",
    "    * Define an AssociationRules object\n",
    "    * Analyze the results\n",
    "    * Experiment for yourself with pruning!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Get started"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Import what you need__\n",
    "\n",
    "As usual, we first import some basic modules from ibmdbpy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ibmdbpy import IdaDataBase, IdaDataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Establish a JDBC connection to Db2 Warehouse__\n",
    "\n",
    "Please enter your credentials in order to connect to the database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Enter the values for you database connection\n",
    "dsn_database = \"___\" # e.g. \"BLUDB\"\n",
    "dsn_hostname = \"___\" # e.g.: \"abc.url.example\"\n",
    "dsn_port = \"___\"    # e.g. \"50000\"\n",
    "dsn_uid = \"___\"     # e.g. \"db2_1234\"\n",
    "dsn_pwd = \"___\"     # e.g. \"zorglub\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Establish connection\n",
    "connection_string='jdbc:db2://'+dsn_hostname+':'+dsn_port+'/'+dsn_database+':user='+dsn_uid+';password='+dsn_pwd+\";\" \n",
    "idadb=IdaDataBase(dsn=connection_string, verbose = False)\n",
    "# Set verbose to True if you want to see the detail of ibmdbpy operations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default the environment variable `AUTOCOMMIT` is then set to True, which means that every SQL statement which is submitted through the connection is executed within its own transaction and then committed implicitly. When you close the connection to Db2, if the environment variable `AUTOCOMMIT` is set to False, then all changes after the last explicit commit are discarded. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Explore the data__\n",
    "\n",
    "We use Db2 Warehouse sample data that can be downloaded [here](https://www.ibm.com/support/knowledgecenter/SSCJDQ/com.ibm.swg.im.dashdb.doc/admin/sampledata.html) . \n",
    "\n",
    "* GOSALES.ORDER_DETAILS contains data about transactions: product, price, quantity, shipment etc. \n",
    "* GOSALES.PRODUCT presents information about each product's characteristics: product type, color, brand etc. \n",
    "* GOSALES.PRODUCT_TYPE contains the name of each product type in various languages such as arabic, german, english, corean etc. \n",
    "* GOSALES.PRODUCT_NAME_LOOKUP contains, for each product, its name in all contries' language where it is sold. \n",
    "\n",
    "Let's define IdaDataFrames to explore our data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* __Explore the original tables__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ORDER_DETAIL_CODE</th>\n",
       "      <th>ORDER_NUMBER</th>\n",
       "      <th>SHIP_DATE</th>\n",
       "      <th>PRODUCT_NUMBER</th>\n",
       "      <th>PROMOTION_CODE</th>\n",
       "      <th>QUANTITY</th>\n",
       "      <th>UNIT_COST</th>\n",
       "      <th>UNIT_PRICE</th>\n",
       "      <th>UNIT_SALE_PRICE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1000001</td>\n",
       "      <td>100001</td>\n",
       "      <td>2010-01-19 00:00:00</td>\n",
       "      <td>41110</td>\n",
       "      <td>0</td>\n",
       "      <td>256</td>\n",
       "      <td>15.62</td>\n",
       "      <td>35.09</td>\n",
       "      <td>33.69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1000002</td>\n",
       "      <td>100001</td>\n",
       "      <td>2010-02-17 00:00:00</td>\n",
       "      <td>69110</td>\n",
       "      <td>0</td>\n",
       "      <td>92</td>\n",
       "      <td>49.69</td>\n",
       "      <td>110.00</td>\n",
       "      <td>102.30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1000013</td>\n",
       "      <td>100002</td>\n",
       "      <td>2010-01-19 00:00:00</td>\n",
       "      <td>75110</td>\n",
       "      <td>0</td>\n",
       "      <td>162</td>\n",
       "      <td>80.00</td>\n",
       "      <td>119.69</td>\n",
       "      <td>111.31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1000014</td>\n",
       "      <td>100002</td>\n",
       "      <td>2010-01-19 00:00:00</td>\n",
       "      <td>76110</td>\n",
       "      <td>0</td>\n",
       "      <td>172</td>\n",
       "      <td>23.53</td>\n",
       "      <td>40.52</td>\n",
       "      <td>38.90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1000015</td>\n",
       "      <td>100002</td>\n",
       "      <td>2010-01-19 00:00:00</td>\n",
       "      <td>85110</td>\n",
       "      <td>0</td>\n",
       "      <td>74</td>\n",
       "      <td>176.47</td>\n",
       "      <td>359.60</td>\n",
       "      <td>334.43</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   ORDER_DETAIL_CODE  ORDER_NUMBER            SHIP_DATE  PRODUCT_NUMBER  \\\n",
       "0            1000001        100001  2010-01-19 00:00:00           41110   \n",
       "1            1000002        100001  2010-02-17 00:00:00           69110   \n",
       "2            1000013        100002  2010-01-19 00:00:00           75110   \n",
       "3            1000014        100002  2010-01-19 00:00:00           76110   \n",
       "4            1000015        100002  2010-01-19 00:00:00           85110   \n",
       "\n",
       "   PROMOTION_CODE  QUANTITY  UNIT_COST  UNIT_PRICE  UNIT_SALE_PRICE  \n",
       "0               0       256      15.62       35.09            33.69  \n",
       "1               0        92      49.69      110.00           102.30  \n",
       "2               0       162      80.00      119.69           111.31  \n",
       "3               0       172      23.53       40.52            38.90  \n",
       "4               0        74     176.47      359.60           334.43  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "orders = IdaDataFrame(idadb, \"GOSALES.ORDER_DETAILS\")\n",
    "orders.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Distinct transactions: 53256\n"
     ]
    }
   ],
   "source": [
    "o1 = orders['ORDER_NUMBER'].count_distinct()\n",
    "print(\"Distinct transactions: %s\" %o1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will apply the association rules mining algorithm to the `orders` IdaDataFrame : it contains both the transaction IDs (column `ORDER_NUMBER`) and the product IDs (column `PRODUCT_NUMBER`). The other tables are useful to get some additional or contextual information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PRODUCT_NUMBER</th>\n",
       "      <th>BASE_PRODUCT_NUMBER</th>\n",
       "      <th>INTRODUCTION_DATE</th>\n",
       "      <th>DISCONTINUED_DATE</th>\n",
       "      <th>PRODUCT_TYPE_CODE</th>\n",
       "      <th>PRODUCT_COLOR_CODE</th>\n",
       "      <th>PRODUCT_SIZE_CODE</th>\n",
       "      <th>PRODUCT_BRAND_CODE</th>\n",
       "      <th>PRODUCT_IMAGE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1110</td>\n",
       "      <td>1</td>\n",
       "      <td>2001-02-15 00:00:00</td>\n",
       "      <td>None</td>\n",
       "      <td>951</td>\n",
       "      <td>908</td>\n",
       "      <td>808</td>\n",
       "      <td>701</td>\n",
       "      <td>P01CE1CG1.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2110</td>\n",
       "      <td>2</td>\n",
       "      <td>2001-02-15 00:00:00</td>\n",
       "      <td>None</td>\n",
       "      <td>951</td>\n",
       "      <td>906</td>\n",
       "      <td>807</td>\n",
       "      <td>701</td>\n",
       "      <td>P02CE1CG1.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3110</td>\n",
       "      <td>3</td>\n",
       "      <td>2001-02-15 00:00:00</td>\n",
       "      <td>None</td>\n",
       "      <td>951</td>\n",
       "      <td>924</td>\n",
       "      <td>825</td>\n",
       "      <td>701</td>\n",
       "      <td>P03CE1CG1.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4110</td>\n",
       "      <td>4</td>\n",
       "      <td>2001-02-15 00:00:00</td>\n",
       "      <td>None</td>\n",
       "      <td>951</td>\n",
       "      <td>923</td>\n",
       "      <td>804</td>\n",
       "      <td>701</td>\n",
       "      <td>P04CE1CG1.jpg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5110</td>\n",
       "      <td>5</td>\n",
       "      <td>2001-02-15 00:00:00</td>\n",
       "      <td>None</td>\n",
       "      <td>951</td>\n",
       "      <td>923</td>\n",
       "      <td>823</td>\n",
       "      <td>701</td>\n",
       "      <td>P05CE1CG1.jpg</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PRODUCT_NUMBER  BASE_PRODUCT_NUMBER    INTRODUCTION_DATE DISCONTINUED_DATE  \\\n",
       "0            1110                    1  2001-02-15 00:00:00              None   \n",
       "1            2110                    2  2001-02-15 00:00:00              None   \n",
       "2            3110                    3  2001-02-15 00:00:00              None   \n",
       "3            4110                    4  2001-02-15 00:00:00              None   \n",
       "4            5110                    5  2001-02-15 00:00:00              None   \n",
       "\n",
       "   PRODUCT_TYPE_CODE  PRODUCT_COLOR_CODE  PRODUCT_SIZE_CODE  \\\n",
       "0                951                 908                808   \n",
       "1                951                 906                807   \n",
       "2                951                 924                825   \n",
       "3                951                 923                804   \n",
       "4                951                 923                823   \n",
       "\n",
       "   PRODUCT_BRAND_CODE  PRODUCT_IMAGE  \n",
       "0                 701  P01CE1CG1.jpg  \n",
       "1                 701  P02CE1CG1.jpg  \n",
       "2                 701  P03CE1CG1.jpg  \n",
       "3                 701  P04CE1CG1.jpg  \n",
       "4                 701  P05CE1CG1.jpg  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "products = IdaDataFrame(idadb, \"GOSALES.PRODUCT\")\n",
    "products.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many different products do we have?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Distinct products: 274\n"
     ]
    }
   ],
   "source": [
    "n = products['PRODUCT_NUMBER'].count_distinct()\n",
    "print(\"Distinct products: %s\" %n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The only common column of ORDER_DETAILS and PRODUCT is `PRODUCT_NUMBER`.\n",
    "\n",
    "The only common column of PRODUCT and PRODUCT_TYPE is `PRODUCT_TYPE_CODE`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PRODUCT_TYPE_CODE</th>\n",
       "      <th>PRODUCT_LINE_CODE</th>\n",
       "      <th>PRODUCT_TYPE_EN</th>\n",
       "      <th>PRODUCT_TYPE_AR</th>\n",
       "      <th>PRODUCT_TYPE_CS</th>\n",
       "      <th>PRODUCT_TYPE_DA</th>\n",
       "      <th>PRODUCT_TYPE_DE</th>\n",
       "      <th>PRODUCT_TYPE_EL</th>\n",
       "      <th>PRODUCT_TYPE_ES</th>\n",
       "      <th>PRODUCT_TYPE_FI</th>\n",
       "      <th>...</th>\n",
       "      <th>PRODUCT_TYPE_PL</th>\n",
       "      <th>PRODUCT_TYPE_PT</th>\n",
       "      <th>PRODUCT_TYPE_RO</th>\n",
       "      <th>PRODUCT_TYPE_RU</th>\n",
       "      <th>PRODUCT_TYPE_SC</th>\n",
       "      <th>PRODUCT_TYPE_SL</th>\n",
       "      <th>PRODUCT_TYPE_SV</th>\n",
       "      <th>PRODUCT_TYPE_TC</th>\n",
       "      <th>PRODUCT_TYPE_TH</th>\n",
       "      <th>PRODUCT_TYPE_TR</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>951</td>\n",
       "      <td>991</td>\n",
       "      <td>Cooking Gear</td>\n",
       "      <td>معدات الطهى</td>\n",
       "      <td>Potřeby pro vaření</td>\n",
       "      <td>Kogegrej</td>\n",
       "      <td>Kochzubehör</td>\n",
       "      <td>Εξοπλισμός μαγειρικής</td>\n",
       "      <td>Material de cocina</td>\n",
       "      <td>Keittovälineet</td>\n",
       "      <td>...</td>\n",
       "      <td>Sprzęt kuchenny</td>\n",
       "      <td>Material cozinha</td>\n",
       "      <td>Echipament pentru gătit</td>\n",
       "      <td>Кухонные принадлежности</td>\n",
       "      <td>炊事用具</td>\n",
       "      <td>Oprema za kuhanje</td>\n",
       "      <td>Köksredskap</td>\n",
       "      <td>炊具</td>\n",
       "      <td>อุปกรณ์ทำอาหาร</td>\n",
       "      <td>Mutfak aletleri</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>952</td>\n",
       "      <td>991</td>\n",
       "      <td>Tents</td>\n",
       "      <td>الخيام</td>\n",
       "      <td>Stany</td>\n",
       "      <td>Telte</td>\n",
       "      <td>Zelte</td>\n",
       "      <td>Σκηνές</td>\n",
       "      <td>Tiendas</td>\n",
       "      <td>Teltat</td>\n",
       "      <td>...</td>\n",
       "      <td>Namioty</td>\n",
       "      <td>Barracas</td>\n",
       "      <td>Corturi</td>\n",
       "      <td>Палатки</td>\n",
       "      <td>帐篷</td>\n",
       "      <td>Šotori</td>\n",
       "      <td>Tält</td>\n",
       "      <td>營帳</td>\n",
       "      <td>เต็นท์</td>\n",
       "      <td>Çadırlar</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>953</td>\n",
       "      <td>991</td>\n",
       "      <td>Sleeping Bags</td>\n",
       "      <td>المنامات</td>\n",
       "      <td>Spacáky</td>\n",
       "      <td>Soveposer</td>\n",
       "      <td>Schlafsäcke</td>\n",
       "      <td>Υπνόσακοι</td>\n",
       "      <td>Sacos de dormir</td>\n",
       "      <td>Makuupussit</td>\n",
       "      <td>...</td>\n",
       "      <td>Śpiwory</td>\n",
       "      <td>Saco de dormir</td>\n",
       "      <td>Saci de dormit</td>\n",
       "      <td>Спальные мешки</td>\n",
       "      <td>睡袋</td>\n",
       "      <td>Spalne vreče</td>\n",
       "      <td>Sovsäckar</td>\n",
       "      <td>睡袋</td>\n",
       "      <td>ถุงนอน</td>\n",
       "      <td>Uyku Tulumları</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>954</td>\n",
       "      <td>991</td>\n",
       "      <td>Packs</td>\n",
       "      <td>الحقائب</td>\n",
       "      <td>Batohy</td>\n",
       "      <td>Rygsække</td>\n",
       "      <td>Rucksäcke</td>\n",
       "      <td>Σακίδια</td>\n",
       "      <td>Mochilas</td>\n",
       "      <td>Reput</td>\n",
       "      <td>...</td>\n",
       "      <td>Plecaki</td>\n",
       "      <td>Mochilas</td>\n",
       "      <td>Pachete</td>\n",
       "      <td>Рюкзаки и сумки</td>\n",
       "      <td>背包</td>\n",
       "      <td>Nahrbtniki</td>\n",
       "      <td>Ryggsäckar</td>\n",
       "      <td>背包</td>\n",
       "      <td>เป้</td>\n",
       "      <td>Sırt Çantaları</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>955</td>\n",
       "      <td>991</td>\n",
       "      <td>Lanterns</td>\n",
       "      <td>الفوانيس</td>\n",
       "      <td>Svítilny</td>\n",
       "      <td>Lygter</td>\n",
       "      <td>Lampen</td>\n",
       "      <td>Λάμπες</td>\n",
       "      <td>Linternas</td>\n",
       "      <td>Lyhdyt</td>\n",
       "      <td>...</td>\n",
       "      <td>Latarki</td>\n",
       "      <td>Lanternas</td>\n",
       "      <td>Lanterne</td>\n",
       "      <td>Фонари</td>\n",
       "      <td>手提灯</td>\n",
       "      <td>Svetilke</td>\n",
       "      <td>Lampor</td>\n",
       "      <td>營燈</td>\n",
       "      <td>ตะเกียง</td>\n",
       "      <td>Fenerler</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 31 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   PRODUCT_TYPE_CODE  PRODUCT_LINE_CODE PRODUCT_TYPE_EN PRODUCT_TYPE_AR  \\\n",
       "0                951                991    Cooking Gear     معدات الطهى   \n",
       "1                952                991           Tents          الخيام   \n",
       "2                953                991   Sleeping Bags        المنامات   \n",
       "3                954                991           Packs         الحقائب   \n",
       "4                955                991        Lanterns        الفوانيس   \n",
       "\n",
       "      PRODUCT_TYPE_CS PRODUCT_TYPE_DA PRODUCT_TYPE_DE        PRODUCT_TYPE_EL  \\\n",
       "0  Potřeby pro vaření        Kogegrej     Kochzubehör  Εξοπλισμός μαγειρικής   \n",
       "1               Stany           Telte           Zelte                 Σκηνές   \n",
       "2             Spacáky       Soveposer     Schlafsäcke              Υπνόσακοι   \n",
       "3              Batohy        Rygsække       Rucksäcke                Σακίδια   \n",
       "4            Svítilny          Lygter          Lampen                 Λάμπες   \n",
       "\n",
       "      PRODUCT_TYPE_ES PRODUCT_TYPE_FI  ...  PRODUCT_TYPE_PL   PRODUCT_TYPE_PT  \\\n",
       "0  Material de cocina  Keittovälineet  ...  Sprzęt kuchenny  Material cozinha   \n",
       "1             Tiendas          Teltat  ...          Namioty          Barracas   \n",
       "2     Sacos de dormir     Makuupussit  ...          Śpiwory    Saco de dormir   \n",
       "3            Mochilas           Reput  ...          Plecaki          Mochilas   \n",
       "4           Linternas          Lyhdyt  ...          Latarki         Lanternas   \n",
       "\n",
       "           PRODUCT_TYPE_RO          PRODUCT_TYPE_RU PRODUCT_TYPE_SC  \\\n",
       "0  Echipament pentru gătit  Кухонные принадлежности            炊事用具   \n",
       "1                  Corturi                  Палатки              帐篷   \n",
       "2           Saci de dormit           Спальные мешки              睡袋   \n",
       "3                  Pachete          Рюкзаки и сумки              背包   \n",
       "4                 Lanterne                   Фонари             手提灯   \n",
       "\n",
       "     PRODUCT_TYPE_SL PRODUCT_TYPE_SV PRODUCT_TYPE_TC PRODUCT_TYPE_TH  \\\n",
       "0  Oprema za kuhanje     Köksredskap              炊具  อุปกรณ์ทำอาหาร   \n",
       "1             Šotori            Tält              營帳          เต็นท์   \n",
       "2       Spalne vreče       Sovsäckar              睡袋          ถุงนอน   \n",
       "3         Nahrbtniki      Ryggsäckar              背包             เป้   \n",
       "4           Svetilke          Lampor              營燈         ตะเกียง   \n",
       "\n",
       "   PRODUCT_TYPE_TR  \n",
       "0  Mutfak aletleri  \n",
       "1         Çadırlar  \n",
       "2   Uyku Tulumları  \n",
       "3   Sırt Çantaları  \n",
       "4         Fenerler  \n",
       "\n",
       "[5 rows x 31 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "types = IdaDataFrame(idadb, \"GOSALES.PRODUCT_TYPE\")\n",
    "types.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "How many different product types do we have?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Distinct product types: 21\n"
     ]
    }
   ],
   "source": [
    "t1 = types['PRODUCT_TYPE_CODE'].count_distinct()\n",
    "print('Distinct product types: %s' %t1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PRODUCT_NAME_LOOKUP contains the names of the products in several languages as well as their description."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PRODUCT_NUMBER</th>\n",
       "      <th>PRODUCT_LANGUAGE</th>\n",
       "      <th>PRODUCT_NAME</th>\n",
       "      <th>PRODUCT_DESCRIPTION</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1110</td>\n",
       "      <td>AR</td>\n",
       "      <td>حقيبة مياه TrailChef</td>\n",
       "      <td>حقيبة حفيفة قابلة للطي لحمل السوائل بطريقة سهل...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1110</td>\n",
       "      <td>EL</td>\n",
       "      <td>Δοχείο υγρών Μαρμίτα Σεφ</td>\n",
       "      <td>Ελαφρύ, πτυσσόμενο δοχείο για εύκολη μεταφορά ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1110</td>\n",
       "      <td>DE</td>\n",
       "      <td>TrailChef Wasserbeutel</td>\n",
       "      <td>Leichter, zusammenfaltbarer Beutel zum einfach...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1110</td>\n",
       "      <td>DA</td>\n",
       "      <td>Sahara Vandtaske</td>\n",
       "      <td>Sammentrykkelig letvægtstaske til væsker. Bred...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1110</td>\n",
       "      <td>CS</td>\n",
       "      <td>Vak na vodu Kuchtík</td>\n",
       "      <td>Lehký, skladný vak na tekutiny. Široké hrdlo u...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PRODUCT_NUMBER PRODUCT_LANGUAGE              PRODUCT_NAME  \\\n",
       "0            1110               AR      حقيبة مياه TrailChef   \n",
       "1            1110               EL  Δοχείο υγρών Μαρμίτα Σεφ   \n",
       "2            1110               DE    TrailChef Wasserbeutel   \n",
       "3            1110               DA          Sahara Vandtaske   \n",
       "4            1110               CS       Vak na vodu Kuchtík   \n",
       "\n",
       "                                 PRODUCT_DESCRIPTION  \n",
       "0  حقيبة حفيفة قابلة للطي لحمل السوائل بطريقة سهل...  \n",
       "1  Ελαφρύ, πτυσσόμενο δοχείο για εύκολη μεταφορά ...  \n",
       "2  Leichter, zusammenfaltbarer Beutel zum einfach...  \n",
       "3  Sammentrykkelig letvægtstaske til væsker. Bred...  \n",
       "4  Lehký, skladný vak na tekutiny. Široké hrdlo u...  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "names = IdaDataFrame(idadb, \"GOSALES.PRODUCT_NAME_LOOKUP\")\n",
    "names.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here you see that products with different product numbers (IDs) may have the same product name in the same language, for example because there is just a variation of size or color. There are less product names and descriptions than product numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PRODUCT_NUMBER</th>\n",
       "      <th>PRODUCT_LANGUAGE</th>\n",
       "      <th>PRODUCT_NAME</th>\n",
       "      <th>PRODUCT_DESCRIPTION</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>264</th>\n",
       "      <td>151110</td>\n",
       "      <td>EN</td>\n",
       "      <td>Astro Pilot</td>\n",
       "      <td>This GPS has a high-contrast 86mm color screen...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>265</th>\n",
       "      <td>151120</td>\n",
       "      <td>EN</td>\n",
       "      <td>Astro Pilot</td>\n",
       "      <td>This GPS has a high-contrast 86mm color screen...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>266</th>\n",
       "      <td>151130</td>\n",
       "      <td>EN</td>\n",
       "      <td>Astro Pilot</td>\n",
       "      <td>This GPS has a high-contrast 86mm color screen...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>267</th>\n",
       "      <td>152110</td>\n",
       "      <td>EN</td>\n",
       "      <td>Sky Pilot</td>\n",
       "      <td>Small and lightweight, this GPS has a 55mm col...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>268</th>\n",
       "      <td>153110</td>\n",
       "      <td>EN</td>\n",
       "      <td>Auto Pilot</td>\n",
       "      <td>The Auto Pilot is an easy-to-use navigation sy...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>269</th>\n",
       "      <td>154110</td>\n",
       "      <td>EN</td>\n",
       "      <td>Kodiak</td>\n",
       "      <td>The Kodiak includes a tough leather strap, pre...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>270</th>\n",
       "      <td>154120</td>\n",
       "      <td>EN</td>\n",
       "      <td>Kodiak</td>\n",
       "      <td>The Kodiak includes a tough leather strap, pre...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>271</th>\n",
       "      <td>154130</td>\n",
       "      <td>EN</td>\n",
       "      <td>Kodiak</td>\n",
       "      <td>The Kodiak includes a tough leather strap, pre...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>272</th>\n",
       "      <td>154140</td>\n",
       "      <td>EN</td>\n",
       "      <td>Kodiak</td>\n",
       "      <td>The Kodiak includes a tough leather strap, pre...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>273</th>\n",
       "      <td>154150</td>\n",
       "      <td>EN</td>\n",
       "      <td>Kodiak</td>\n",
       "      <td>The Kodiak includes a tough leather strap, pre...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     PRODUCT_NUMBER PRODUCT_LANGUAGE PRODUCT_NAME  \\\n",
       "264          151110               EN  Astro Pilot   \n",
       "265          151120               EN  Astro Pilot   \n",
       "266          151130               EN  Astro Pilot   \n",
       "267          152110               EN    Sky Pilot   \n",
       "268          153110               EN   Auto Pilot   \n",
       "269          154110               EN       Kodiak   \n",
       "270          154120               EN       Kodiak   \n",
       "271          154130               EN       Kodiak   \n",
       "272          154140               EN       Kodiak   \n",
       "273          154150               EN       Kodiak   \n",
       "\n",
       "                                   PRODUCT_DESCRIPTION  \n",
       "264  This GPS has a high-contrast 86mm color screen...  \n",
       "265  This GPS has a high-contrast 86mm color screen...  \n",
       "266  This GPS has a high-contrast 86mm color screen...  \n",
       "267  Small and lightweight, this GPS has a 55mm col...  \n",
       "268  The Auto Pilot is an easy-to-use navigation sy...  \n",
       "269  The Kodiak includes a tough leather strap, pre...  \n",
       "270  The Kodiak includes a tough leather strap, pre...  \n",
       "271  The Kodiak includes a tough leather strap, pre...  \n",
       "272  The Kodiak includes a tough leather strap, pre...  \n",
       "273  The Kodiak includes a tough leather strap, pre...  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "names[names[\"PRODUCT_LANGUAGE\"]==\"EN\"].tail(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PRODUCT_NUMBER         274\n",
       "PRODUCT_LANGUAGE         1\n",
       "PRODUCT_NAME           144\n",
       "PRODUCT_DESCRIPTION    141\n",
       "dtype: int64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "names[names[\"PRODUCT_LANGUAGE\"]==\"EN\"].count_distinct()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* __Create a view__\n",
    "\n",
    "\n",
    "As you can see from the results of the last statements, the value of PRODUCT_NAME does not always identify a single product. In addition it is sometimes not clear from name of product to which type of product it belongs. So let’s create the view PRODUCT_NAME_TYPE_LOOKUP using the GOALES.PRODUCT, GOSALES.PRODUCT_NAME_LOOKUP and GOSALES.PRODUCT_TYPE table where the concatenation of PRODUCT_NAME, PRODUCT_NUMBER and PRODUCT_TYPE_NAME can be used as a unique and understandable name for a product. \n",
    "\n",
    "This new view will be useful to get the full English name of a product in order to interpret the rules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create view PRODUCT_NAME_TYPE_LOOKUP\n",
    "\n",
    "idadb.ida_query(\"create or replace view PRODUCT_NAME_TYPE_LOOKUP as select p.PRODUCT_NUMBER, PRODUCT_NAME, PRODUCT_TYPE_EN, PRODUCT_NAME || '-' || n.PRODUCT_NUMBER || '-' || PRODUCT_TYPE_EN as PRODUCT_NAME_TYPE from GOSALES.PRODUCT p, GOSALES.PRODUCT_NAME_LOOKUP n, GOSALES.PRODUCT_TYPE t where p.PRODUCT_NUMBER = n.PRODUCT_NUMBER and p.PRODUCT_TYPE_CODE = t.PRODUCT_TYPE_CODE and n.PRODUCT_LANGUAGE='EN';\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a quick look at it. We define an IdaDataFrame pointing at this view, with `PRODUCT_NUMBER` as indexer column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PRODUCT_NUMBER</th>\n",
       "      <th>PRODUCT_NAME</th>\n",
       "      <th>PRODUCT_TYPE_EN</th>\n",
       "      <th>PRODUCT_NAME_TYPE</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1110</td>\n",
       "      <td>TrailChef Water Bag</td>\n",
       "      <td>Cooking Gear</td>\n",
       "      <td>TrailChef Water Bag-1110-Cooking Gear</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2110</td>\n",
       "      <td>TrailChef Canteen</td>\n",
       "      <td>Cooking Gear</td>\n",
       "      <td>TrailChef Canteen-2110-Cooking Gear</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3110</td>\n",
       "      <td>TrailChef Kitchen Kit</td>\n",
       "      <td>Cooking Gear</td>\n",
       "      <td>TrailChef Kitchen Kit-3110-Cooking Gear</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4110</td>\n",
       "      <td>TrailChef Cup</td>\n",
       "      <td>Cooking Gear</td>\n",
       "      <td>TrailChef Cup-4110-Cooking Gear</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5110</td>\n",
       "      <td>TrailChef Cook Set</td>\n",
       "      <td>Cooking Gear</td>\n",
       "      <td>TrailChef Cook Set-5110-Cooking Gear</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PRODUCT_NUMBER           PRODUCT_NAME PRODUCT_TYPE_EN  \\\n",
       "0            1110    TrailChef Water Bag    Cooking Gear   \n",
       "1            2110      TrailChef Canteen    Cooking Gear   \n",
       "2            3110  TrailChef Kitchen Kit    Cooking Gear   \n",
       "3            4110          TrailChef Cup    Cooking Gear   \n",
       "4            5110     TrailChef Cook Set    Cooking Gear   \n",
       "\n",
       "                         PRODUCT_NAME_TYPE  \n",
       "0    TrailChef Water Bag-1110-Cooking Gear  \n",
       "1      TrailChef Canteen-2110-Cooking Gear  \n",
       "2  TrailChef Kitchen Kit-3110-Cooking Gear  \n",
       "3          TrailChef Cup-4110-Cooking Gear  \n",
       "4     TrailChef Cook Set-5110-Cooking Gear  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define IdaDataFrame\n",
    "idadf = IdaDataFrame(idadb, 'PRODUCT_NAME_TYPE_LOOKUP', indexer = 'PRODUCT_NUMBER')\n",
    "idadf.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use this view for association rules interpretation. Instead of displaying rules as a complicated list of IDs, we want to be able to read them directly as \"product A goes with product B\", in plain language. \n",
    "\n",
    "But as seen before,  products with distinct IDs may have the same product name. So in order to be able to differentiate them we concatenate the name and the ID. Additionally we concatenate the product type to have a better understanding of the product at first glance. This is what you see in the `PRODUCT_NAME_TYPE` column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Distinct products: 274\n",
      "Distinct product names: 274\n"
     ]
    }
   ],
   "source": [
    "#SANITY CHECK: we should obtainthe same number of distinct items\n",
    "n1 = idadf['PRODUCT_NUMBER'].count_distinct()\n",
    "n2 = idadf['PRODUCT_NAME_TYPE'].count_distinct()\n",
    "\n",
    "print(\"Distinct products: %s\" %n1)\n",
    "print(\"Distinct product names: %s\" %n2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Right, we now have a good overview of our datasets. Let's mine association rules!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Mine Association Rules with ibmdbpy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reminder about Association Rules Mining\n",
    "\n",
    "\n",
    "__General task__\n",
    "\n",
    "A collection of instances, here the content of shopping baskets, is our input. We want to output rules to predict the values of any attribute from the values of other attributes: e.g. knowing some items of a shopping basket, what other products are likely to be purchased?\n",
    "\n",
    "* Definitions\n",
    "\n",
    "**Support of an itemset in a transaction list**: fraction of transactions of the dataset containing this itemset.\n",
    "> support(itemset) = number of transactions containing the itemset / total number of transactions in the dataset\n",
    "\n",
    "For example, if your transaction dataset is {{bread, peanut butter}, {milk, bread, eggs, bacon}, {milk, bread, butter}}, then the support of the itemset {bread, milk} is 2/3.\n",
    "\n",
    "**Frequent itemset**: an itemset whose support in your transaction list is greater than some threshold t.\n",
    "\n",
    "**Rule R**: an implication of type \"X--> Y\". For example: {peanut butter} --> {bread}, {milk, bread, eggs} --> {bacon, butter}, {beer, crips} --> {pizza}\n",
    "\n",
    "Note: \"-->\" means co-occurrence... not causality!\n",
    "\n",
    "**Rule head** and **rule body**: The principal parts of an association rule are the rule body (also referred to as antecedent) and the rule head (also referred to as consequent). For example, where X --> Y is an association rule: The item set X is the rule body. The item Y is the rule head.\n",
    "\n",
    "* Evaluation of association rules\n",
    "\n",
    "Take the association rule R: \"X-->Y\" where X and Y are two itemsets.\n",
    "Then the support of the rule R is defined by the fraction of transactions containing all the elements from both X and Y.\n",
    "> support(\"X-->Y\") = support(X ∪ Y)\n",
    "\n",
    "The confidence of the rule is the fraction of itemsets containing X and Y relatively to the number of itemsets containing X. \n",
    "> confidence(\"X-->Y\") = support(\"X-->Y\") / support(X) = support(X ∪ Y) / support(X)\n",
    "\n",
    "The lift of a rule can give a better assessment than its confidence because it considers both the confidence of the rule and the distribution of Y, hence the overall data set. It can be seen as the rule confidence divided by the support of the rule head Y.\n",
    "> lift (\"X-->Y\") = confidence(\"X-->Y\") / support(Y)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Put your knowledge into practice\n",
    "\n",
    "We use ibmdbpy library dedicated to machine learning. You can visit the documentation [here](https://pythonhosted.org/ibmdbpy/association_rules.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ibmdbpy.learn.association_rules as assoc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Define an AssociationRules object__\n",
    "\n",
    "It is ready to be used for fitting and prediction:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "Assoc = assoc.AssociationRules(modelname=\"SHOPPING_RULES\", minsupport=0.04, maxlen=4, maxheadlen=1, minconf=0.2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The AssociationRules class provides an interface for using the `ASSOCRULES` and `PREDICT_ASSOCRULES` IDAX methods of Db2.\n",
    "\n",
    "* **minsupport**: The minimum fraction (0.0 - 1.0) or the minimum number (above 1) of transactions that must contain a pattern to be considered as frequent. By default, it ranges between >0.0 and <1.0 for a minimum fraction, or above 1 for a minimum number of transactions.\n",
    "\n",
    "* **maxlen**: The maximum length of a pattern or a rule, that is, the maximum number of items per pattern or rule. Default is 5.\n",
    "\n",
    "* **maxheadlen**: The maximum length of a rule head, that is, the maximum number of items that might belong to the item set on the right side of a rule. Increasing this value might significantly increase the number of detected rules.\n",
    "\n",
    "* **minconf** : float, optional. It is the minimum confidence that a rule must achieve to be kept in the model of the pattern. >=0.0 and <= 1, default is 0.5. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'modelname': 'SHOPPING_RULES',\n",
       " 'minsupport': 0.04,\n",
       " 'maxlen': 4,\n",
       " 'maxheadlen': 1,\n",
       " 'minconf': 0.2,\n",
       " 'nametable': None,\n",
       " 'namecol': None,\n",
       " 'outtable': None,\n",
       " 'type': None,\n",
       " 'limit': None,\n",
       " 'sort': None}"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Inspect the object\n",
    "Assoc.get_params()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see that many parameters, such as `nametable` and `namecol` have not been set yet. They can be set at fitting step. `nametable` is a table containing the explicit names of all products, with their IDs. We specify `namecol` to explicitly identify, in this table, the column containing the names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "assocpatterns\n",
      "    ITEMSETID  ITEMID\n",
      "0           1       1\n",
      "1           2       2\n",
      "2           3       3\n",
      "3           4       4\n",
      "4           5       5\n",
      "5           6       6\n",
      "6           7       7\n",
      "7           8       8\n",
      "8           9       9\n",
      "9          10      10\n",
      "10         11      11\n",
      "11         12      12\n",
      "12         13      13\n",
      "13         14      14\n",
      "14         15      15\n",
      "15         16      16\n",
      "16         17      17\n",
      "17         18      18\n",
      "18         19      19\n",
      "19         20      20\n",
      "20         21      21\n",
      "21         22      22\n",
      "22         23      23\n",
      "23         24      24\n",
      "24         25      25\n",
      "25         26      26\n",
      "26         27      27\n",
      "27         28      28\n",
      "28         29      29\n",
      "29         30      30\n",
      "..        ...     ...\n",
      "46         47      47\n",
      "47         48      48\n",
      "48         49      49\n",
      "49         50      50\n",
      "50         51      51\n",
      "51         52      52\n",
      "52         53      53\n",
      "53         54      54\n",
      "54         57       5\n",
      "55         62      19\n",
      "56         64      38\n",
      "57         65      38\n",
      "58         55       4\n",
      "59         56       5\n",
      "60         58      10\n",
      "61         59      12\n",
      "62         60      14\n",
      "63         61      19\n",
      "64         63      38\n",
      "65         55      38\n",
      "66         56      38\n",
      "67         57      41\n",
      "68         58      38\n",
      "69         59      38\n",
      "70         60      38\n",
      "71         61      38\n",
      "72         62      41\n",
      "73         63      41\n",
      "74         64      42\n",
      "75         65      51\n",
      "\n",
      "[76 rows x 2 columns]\n",
      " \n",
      "assocpatterns_stats\n",
      "    ITEMSETID  LENGTH  COUNT   SUPPORT      LIFT  PRUNED\n",
      "0          54       1   2254  0.042324  1.000000       0\n",
      "1          57       2   2483  0.046624  2.976649       0\n",
      "2          17       1   3129  0.058754  1.000000       0\n",
      "3          23       1   2539  0.047675  1.000000       0\n",
      "4          60       2   2451  0.046023  2.040719       0\n",
      "5          19       1   6520  0.122428  1.000000       0\n",
      "6           9       1   2400  0.045065  1.000000       0\n",
      "7          15       1   3705  0.069570  1.000000       0\n",
      "8          30       1   2319  0.043544  1.000000       0\n",
      "9          35       1   2892  0.054304  1.000000       0\n",
      "10         63       2   4165  0.078207  2.536326       0\n",
      "11         56       2   2903  0.054510  2.525859       0\n",
      "12         51       1   5241  0.098411  1.000000       0\n",
      "13         50       1   3028  0.056857  1.000000       0\n",
      "14         46       1   4073  0.076480  1.000000       0\n",
      "15          5       1   5576  0.104702  1.000000       0\n",
      "16         10       1   5679  0.106636  1.000000       0\n",
      "17         34       1   3745  0.070321  1.000000       0\n",
      "18         41       1   7967  0.149598  1.000000       0\n",
      "19          4       1   5678  0.106617  1.000000       0\n",
      "20         28       1   2696  0.050623  1.000000       0\n",
      "21         14       1   5827  0.109415  1.000000       0\n",
      "22         18       1   3989  0.074902  1.000000       0\n",
      "23         61       2   3246  0.060951  2.415382       0\n",
      "24         26       1   4173  0.078357  1.000000       0\n",
      "25          7       1   2437  0.045760  1.000000       0\n",
      "26         29       1   4445  0.083465  1.000000       0\n",
      "27         53       1   2912  0.054679  1.000000       0\n",
      "28         21       1   4238  0.079578  1.000000       0\n",
      "29         38       1  10977  0.206118  1.000000       0\n",
      "..        ...     ...    ...       ...       ...     ...\n",
      "35         62       2   2349  0.044108  2.408292       0\n",
      "36         42       1   5102  0.095801  1.000000       0\n",
      "37         45       1   3958  0.074320  1.000000       0\n",
      "38         32       1   3777  0.070922  1.000000       0\n",
      "39         24       1   3027  0.056839  1.000000       0\n",
      "40          2       1   2447  0.045948  1.000000       0\n",
      "41         37       1   2686  0.050436  1.000000       0\n",
      "42         40       1   2230  0.041873  1.000000       0\n",
      "43         22       1   3543  0.066528  1.000000       0\n",
      "44         27       1   2343  0.043995  1.000000       0\n",
      "45          1       1   2511  0.047150  1.000000       0\n",
      "46         59       2   2164  0.040634  1.948925       0\n",
      "47         52       1   4288  0.080517  1.000000       0\n",
      "48          8       1   4527  0.085005  1.000000       0\n",
      "49         48       1   4131  0.077569  1.000000       0\n",
      "50         55       2   2497  0.046887  2.133576       0\n",
      "51         65       2   2341  0.043957  2.167066       0\n",
      "52          6       1   3763  0.070659  1.000000       0\n",
      "53         43       1   4247  0.079747  1.000000       0\n",
      "54         25       1   4201  0.078883  1.000000       0\n",
      "55         11       1   3795  0.071260  1.000000       0\n",
      "56         20       1   4273  0.080235  1.000000       0\n",
      "57         47       1   4543  0.085305  1.000000       0\n",
      "58         12       1   5387  0.101153  1.000000       0\n",
      "59         16       1   3926  0.073719  1.000000       0\n",
      "60         44       1   4459  0.083728  1.000000       0\n",
      "61         13       1   2677  0.050267  1.000000       0\n",
      "62         58       2   2492  0.046793  2.128928       0\n",
      "63         36       1   2877  0.054022  1.000000       0\n",
      "64          3       1   2670  0.050135  1.000000       0\n",
      "\n",
      "[65 rows x 6 columns]\n",
      " \n",
      "assocrules\n",
      "    RULEID  ITEMSETID  BODYID  HEADID  CONFIDENCE  PRUNED\n",
      "0        1         55       4      38    0.439768       0\n",
      "1        2         55      38       4    0.227476       0\n",
      "2        3         56       5      38    0.520624       0\n",
      "3        4         56      38       5    0.264462       0\n",
      "4        5         57       5      41    0.445301       0\n",
      "5        6         57      41       5    0.311661       0\n",
      "6        7         58      10      38    0.438810       0\n",
      "7        8         58      38      10    0.227020       0\n",
      "8        9         59      12      38    0.401708       0\n",
      "9       10         60      14      38    0.420628       0\n",
      "10      11         60      38      14    0.223285       0\n",
      "11      12         61      19      38    0.497853       0\n",
      "12      13         61      38      19    0.295709       0\n",
      "13      14         62      19      41    0.360276       0\n",
      "14      15         62      41      19    0.294841       0\n",
      "15      16         63      38      41    0.379430       0\n",
      "16      17         63      41      38    0.522781       0\n",
      "17      18         64      38      42    0.210713       0\n",
      "18      19         64      42      38    0.453352       0\n",
      "19      20         65      38      51    0.213264       0\n",
      "20      21         65      51      38    0.446670       0\n",
      " \n",
      "items\n",
      "    ITEMID    ITEM                         ITEMNAME  COUNT   SUPPORT\n",
      "0        1  122140             Bella-122140-Eyewear   2511  0.047150\n",
      "1        2  124110           Cat Eye-124110-Eyewear   2447  0.045948\n",
      "2        3  124140           Cat Eye-124140-Eyewear   2670  0.050135\n",
      "3        4  124190           Cat Eye-124190-Eyewear   5678  0.106617\n",
      "4        5  125110             Venue-125110-Watches   5576  0.104702\n",
      "5        6  125120             Venue-125120-Watches   3763  0.070659\n",
      "6        7  125150             Venue-125150-Watches   2437  0.045760\n",
      "7        8  126110             Dante-126110-Eyewear   4527  0.085005\n",
      "8        9  126130             Dante-126130-Eyewear   2400  0.045065\n",
      "9       10  126140             Dante-126140-Eyewear   5679  0.106636\n",
      "10      11  127110           Fairway-127110-Eyewear   3795  0.071260\n",
      "11      12  127130           Fairway-127130-Eyewear   5387  0.101153\n",
      "12      13  128130           Inferno-128130-Eyewear   2677  0.050267\n",
      "13      14  128140           Inferno-128140-Eyewear   5827  0.109415\n",
      "14      15  128150           Inferno-128150-Eyewear   3705  0.069570\n",
      "15      16  128200           Inferno-128200-Eyewear   3926  0.073719\n",
      "16      17  128210           Inferno-128210-Eyewear   3129  0.058754\n",
      "17      18  129110          Infinity-129110-Watches   3989  0.074902\n",
      "18      19  129130          Infinity-129130-Watches   6520  0.122428\n",
      "19      20  129150          Infinity-129150-Watches   4273  0.080235\n",
      "20      21  129180          Infinity-129180-Watches   4238  0.079578\n",
      "21      22  130110               Lux-130110-Watches   3543  0.066528\n",
      "22      23  130130               Lux-130130-Watches   2539  0.047675\n",
      "23      24  131120          Max Gizmo-131120-Knives   3027  0.056839\n",
      "24      25  132110           Maximus-132110-Eyewear   4201  0.078883\n",
      "25      26  132120           Maximus-132120-Eyewear   4173  0.078357\n",
      "26      27  132140           Maximus-132140-Eyewear   2343  0.043995\n",
      "27      28  132150           Maximus-132150-Eyewear   2696  0.050623\n",
      "28      29  132170           Maximus-132170-Eyewear   4445  0.083465\n",
      "29      30  134120       Pocket Gizmo-134120-Knives   2319  0.043544\n",
      "30      31  135110  Ranger Vision-135110-Binoculars   2540  0.047694\n",
      "31      32  135120  Ranger Vision-135120-Binoculars   3777  0.070922\n",
      "32      33  136140               Sam-136140-Watches   4292  0.080592\n",
      "33      34  143110            Trendi-143110-Eyewear   3745  0.070321\n",
      "34      35  143120            Trendi-143120-Eyewear   2892  0.054304\n",
      "35      36  144150                TX-144150-Watches   2877  0.054022\n",
      "36      37  144170                TX-144170-Watches   2686  0.050436\n",
      "37      38  144180                TX-144180-Watches  10977  0.206118\n",
      "38      39  145130            Legend-145130-Watches   4081  0.076630\n",
      "39      40  145160            Legend-145160-Watches   2230  0.041873\n",
      "40      41  145170            Legend-145170-Watches   7967  0.149598\n",
      "41      42  147110              Zone-147110-Eyewear   5102  0.095801\n",
      "42      43  147120              Zone-147120-Eyewear   4247  0.079747\n",
      "43      44  147130              Zone-147130-Eyewear   4459  0.083728\n",
      "44      45  147160              Zone-147160-Eyewear   3958  0.074320\n",
      "45      46  147170              Zone-147170-Eyewear   4073  0.076480\n",
      "46      47  147180              Zone-147180-Eyewear   4543  0.085305\n",
      "47      48  148110          Hawk Eye-148110-Eyewear   4131  0.077569\n",
      "48      49  148120          Hawk Eye-148120-Eyewear   3440  0.064594\n",
      "49      50  148130          Hawk Eye-148130-Eyewear   3028  0.056857\n",
      "50      51  149140             Retro-149140-Eyewear   5241  0.098411\n",
      "51      52  151110    Astro Pilot-151110-Navigation   4288  0.080517\n",
      "52      53  152110      Sky Pilot-152110-Navigation   2912  0.054679\n",
      "53      54  154150            Kodiak-154150-Watches   2254  0.042324\n",
      " \n"
     ]
    }
   ],
   "source": [
    "Assoc.fit(orders, transaction_id=\"ORDER_NUMBER\", item_id=\"PRODUCT_NUMBER\", nametable = \"PRODUCT_NAME_TYPE_LOOKUP\", namecol='PRODUCT_NAME_TYPE', verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Understand the intermediary tables**\n",
    "\n",
    "In the cell above, intermediate tables have been printed because we have set `verbose` to True. You can also obtain them with `AssociationRules.describe(detail=True)`. Here is an explanation of these tables.\n",
    "\n",
    "The first table **assocpatterns** shows which itemset (`ITEMSETID` column) contains which items (`ITEMID` column). Only the frequent itemsets are shown i.e. itemsets that are eligible according to the constraints you have set on your AssociationRules object. \n",
    "\n",
    "For example, the table above tells you that 65 itemsets are considered to be frequent. \n",
    "\n",
    "    1:{1}, 2:{2}, 3:{3}, ... 55:{4,38},..., 59:{12,38},..., 62:{19,41},..., 64:{42,38}, 65:{38, 51} \n",
    "    \n",
    "Frequent itemset with itemsetid 1 contains product with itemid 1, frequent itemset with itemsetid 2 contains product with itemid 2, ..., frequent itemset with itemsetid 55 contains products with itemid 4 and 38, ... frequent itemset with itemsetid 65 contains products with itemid 38 and 51.\n",
    "    \n",
    "Note that the item IDs are the IDs of the items in table \"item\". They are not the original product IDs from the product table.\n",
    "\n",
    "The second table **assocpatterns_stats** provides for each itemset some metrics: its cardinal (`LENGTH`), its absolute frequency in the dataset (`COUNT`), its support and its lift, as well as whether it was \"pruned\".\n",
    "\n",
    "The third table **assocrules** gives you the rules that have been found, according to the constraints you have set. Here we have 21 rules. \n",
    "\n",
    "    * Rule 1: itemset 4 --> itemset 38 id est {4} --> {38}, corresponds to itemset 55: {4,38};\n",
    "    * Rule 3: itemset 5 --> itemset 38 id est {5} --> {38}, corresponds to itemset 56: {5,38};\n",
    "    * Rule 18: itemset 38 --> itemset 42 id est {38} --> {42}, corresponds to itemset 64: {38,42};\n",
    "    * Rule 21: itemset 51 --> itemset 38 id est {51} --> {38}, corresponds to itemset 65: {38,51}.\n",
    "    \n",
    "The **items** table provides the mapping between new item IDs (`ITEMID`) and original item IDs (`ITEM`). It uses the `PRODUCT_NAME_TYPE_LOOKUP`table for interpretation and also gives support and count for each item."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Analyze the results__\n",
    "\n",
    "Printing all these intermediary tables with new IDs can be confusing... The `describe()` method is there to provide you with an overview of the rules, with product names for better understanding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Summary of the rules\n",
      "    ID                                               RULE   SUPPORT      LIFT  \\\n",
      "0   17     (Legend-145170-Watches) => (TX-144180-Watches)  0.078207  2.536326   \n",
      "1   16     (TX-144180-Watches) => (Legend-145170-Watches)  0.078207  2.536326   \n",
      "2   12   (Infinity-129130-Watches) => (TX-144180-Watches)  0.060951  2.415382   \n",
      "3   13   (TX-144180-Watches) => (Infinity-129130-Watches)  0.060951  2.415382   \n",
      "4    3      (Venue-125110-Watches) => (TX-144180-Watches)  0.054510  2.525859   \n",
      "5    4      (TX-144180-Watches) => (Venue-125110-Watches)  0.054510  2.525859   \n",
      "6    1    (Cat Eye-124190-Eyewear) => (TX-144180-Watches)  0.046887  2.133576   \n",
      "7    2    (TX-144180-Watches) => (Cat Eye-124190-Eyewear)  0.046887  2.133576   \n",
      "8    7      (Dante-126140-Eyewear) => (TX-144180-Watches)  0.046793  2.128928   \n",
      "9    8      (TX-144180-Watches) => (Dante-126140-Eyewear)  0.046793  2.128928   \n",
      "10   5  (Venue-125110-Watches) => (Legend-145170-Watches)  0.046624  2.976649   \n",
      "11   6  (Legend-145170-Watches) => (Venue-125110-Watches)  0.046624  2.976649   \n",
      "12  10    (Inferno-128140-Eyewear) => (TX-144180-Watches)  0.046023  2.040719   \n",
      "13  11    (TX-144180-Watches) => (Inferno-128140-Eyewear)  0.046023  2.040719   \n",
      "14  14  (Infinity-129130-Watches) => (Legend-145170-Wa...  0.044108  2.408292   \n",
      "15  15  (Legend-145170-Watches) => (Infinity-129130-Wa...  0.044108  2.408292   \n",
      "16  21      (Retro-149140-Eyewear) => (TX-144180-Watches)  0.043957  2.167066   \n",
      "17  20      (TX-144180-Watches) => (Retro-149140-Eyewear)  0.043957  2.167066   \n",
      "18  19       (Zone-147110-Eyewear) => (TX-144180-Watches)  0.043432  2.199480   \n",
      "19  18       (TX-144180-Watches) => (Zone-147110-Eyewear)  0.043432  2.199480   \n",
      "20   9    (Fairway-127130-Eyewear) => (TX-144180-Watches)  0.040634  1.948925   \n",
      "\n",
      "    CONFIDENCE  \n",
      "0     0.522781  \n",
      "1     0.379430  \n",
      "2     0.497853  \n",
      "3     0.295709  \n",
      "4     0.520624  \n",
      "5     0.264462  \n",
      "6     0.439768  \n",
      "7     0.227476  \n",
      "8     0.438810  \n",
      "9     0.227020  \n",
      "10    0.445301  \n",
      "11    0.311661  \n",
      "12    0.420628  \n",
      "13    0.223285  \n",
      "14    0.360276  \n",
      "15    0.294841  \n",
      "16    0.446670  \n",
      "17    0.213264  \n",
      "18    0.453352  \n",
      "19    0.210713  \n",
      "20    0.401708  \n"
     ]
    }
   ],
   "source": [
    "# You can print the tables again with the describe method:\n",
    "Assoc.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Comment\n",
    "\n",
    "It seems like in this outdoor store, clients often buy several watches or several pair of glasses at a time and also eyewear and watches together. For example if you buy a \"Legend\" watch you are likely to buy a \"Venue\" watch too. Same for \"Fairway\" eyewear  and \"TX\" watch, \"Retro\" eyewear  and \"TX\" watch etc. \n",
    "\n",
    "The best rule links the items \"TX-144180-Watches\" (support = 0.206118) and \"Legend-145170-Watches\" (support = 0.149598) and has the highest lift as well as the highest confidence. When a customer buys a Legend watch he is twice more likely to buy a TX watch as the others.\n",
    "\n",
    "\"Max Gizmo\" knives are \"frequent\" because their support was above the threshold but this product does not appear in any of the rules above because no rule using this product could satisfy the confidence constraint we defined."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Confidence interpretation\n",
    "\n",
    "Don't be mislead by high confidence values! A rule with higher confidence could seem more accurate, because better supported. However we have to assess the accuracy of a rule relatively to the dataset. So a high confidence does not necessarily mean that there is a high level of implication in the rule. Sometimes it's just that Y is frequent (or much more frequent than X), so that support(X ∪ Y) / support(X) is high. That is why we use the lift metric.\n",
    "\n",
    "\n",
    "* Lift interpretation \n",
    "\n",
    "Remember that the value of lift gives a better assessment because it considers both the confidence of the rule and the distribution of Y, hence the overall data set. \n",
    "\n",
    "If the lift is **equal to 1**, then the probability of occurrence of the antecedent X and that of the consequent Y are independent of each other. This means that **no rule** can be drawn involving these two events.\n",
    "If the lift is **above 1**, then it tells us to which degree these two occurrences are **dependent on one another**, and we can try to use these rules to make prediction on future datasets.\n",
    "If the lift is **under 1**, then the items are **substitute to each other**. This means that presence of one item has a negative effect on presence of the other item and vice versa. If you buy product X you are less likely to buy product Y and vice versa."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Experiment for yourself with pruning__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pruning enables you to remove rules and pattern which you are not interested in. Here we proceed by filtering the rules that have been selected above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Summary of the rules\n",
      "   ID                                              RULE   SUPPORT      LIFT  \\\n",
      "0  17    (Legend-145170-Watches) => (TX-144180-Watches)  0.078207  2.536326   \n",
      "1  16    (TX-144180-Watches) => (Legend-145170-Watches)  0.078207  2.536326   \n",
      "2  12  (Infinity-129130-Watches) => (TX-144180-Watches)  0.060951  2.415382   \n",
      "3  13  (TX-144180-Watches) => (Infinity-129130-Watches)  0.060951  2.415382   \n",
      "\n",
      "   CONFIDENCE  \n",
      "0    0.522781  \n",
      "1    0.379430  \n",
      "2    0.497853  \n",
      "3    0.295709  \n"
     ]
    }
   ],
   "source": [
    "Assoc.prune(itemsin=None, itemsout=None, minlen=1, maxlen=None, \n",
    "           minsupport=0.06, maxsupport=1, minlift=None, maxlift=None, \n",
    "           minconf=None, maxconf=None, reset=True)\n",
    "\n",
    "Assoc.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note: when printing the intermediary table by setting the option `detail=True` in `describe()`, rules and patterns that have been removed from the model are tagged with a 1 in column `PRUNED`. As shown in the summary table above, only 4 rules remain with our current filtering constraints. Also note the symmetry of the lift and support metrics."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "__Close the connection to Db2__\n",
    "\n",
    "Congratulations! You reached the end of this notebook. Before closing the notebook, close the connection to Db2 with a simple line of code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Connection closed.\n"
     ]
    }
   ],
   "source": [
    "# Close the connection to IBM Db2 Warehouse\n",
    "idadb.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "__Congratulations!__ You are ready to perform association rules mining with ibmdbpy!\n",
    "\n",
    "\n",
    "## Where to go from here ?\n",
    "\n",
    "More examples of ibmdbpy capabilities and ML algorithms are available on this GitHub repository.\n",
    "\n",
    "* Getting started with ibmdbpy :\n",
    "        \n",
    "    [Basics](../GettingStarted/ibmdbpy_GettingStarted_1-basics.ipynb)\n",
    "\n",
    "    [GeoBasics](../GettingStarted/ibmdbpy_GettingStarted_2-geo_basics.ipynb)\n",
    "\n",
    "    [Extensive Guide](../GettingStarted/ibmdbpy_GettingStarted_3-geo_guide.ipynb)\n",
    "    \n",
    "\n",
    "* More practice : analyze the Museums dataset, understand how to create IdaDataFrames and IdaGeoDataFrames:\n",
    "        \n",
    "    [Preprocessing](../MuseumsUseCase/ibmdbpy_Museums_DataAnalysis_1-preprocessing.ipynb)\n",
    "\n",
    "    [Geospatial recommendation](../MuseumsUseCase/ibmdbpy_Museums_DataAnalysis_2-geospatial.ipynb)\n",
    "\n",
    "\n",
    "* Machine learning with ibmdbpy: \n",
    "        \n",
    "    [Naïve Bayes](../MachineLearning/ibmdbpy_NaiveBayes.ipynb)\n",
    "\n",
    "    [KMeans](../GettingStarted/ibmdbpy_GettingStarted_1-basics.ipynb) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "________\n",
    "__Author__\n",
    "\n",
    "Eva Feillet, ML intern @ IBM Research & Development Lab, Germany. IBM Cloud and Cognitive Software, Böblingen, Germany."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}