{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multi-Touch Attribution with Markov Chains\n",
    "\n",
    "This notebook explores implementing a multi-touch attribution model using Markov chains. The following articles \n",
    "explain this approach:\n",
    "* [A Beginner’s Guide to Channel Attribution Modeling in Marketing](https://www.analyticsvidhya.com/blog/2018/01/channel-attribution-modeling-using-markov-chains-in-r/)\n",
    "* [Marketing Multi-Channel Attribution model with R](https://analyzecore.com/2016/08/03/attribution-model-r-part-1/)\n",
    "* [Markov Chains in Python: Beginner Tutorial](https://www.datacamp.com/community/tutorials/markov-chains-python-tutorial)\n",
    "\n",
    "\n",
    "This is the graph we are using as our test case:<br>\n",
    "![](https://i0.wp.com/analyzecore.com/wp-content/uploads/2016/07/Screenshot-2016-07-22-14.26.50.png?w=938&ssl=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Import the Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Load the journey dataset and insert a \"Start\" column at the beginning, to explicitly mark the initial state.\n",
    "df = pd.read_csv('simple_conversion_data.csv', header=None)\n",
    "df.insert(0, 'Start', 'Start')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Start</th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Start</td>\n",
       "      <td>C1</td>\n",
       "      <td>C2</td>\n",
       "      <td>C3</td>\n",
       "      <td>Conversion</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Start</td>\n",
       "      <td>C1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Start</td>\n",
       "      <td>C2</td>\n",
       "      <td>C3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Start   0    1    2           3\n",
       "0  Start  C1   C2   C3  Conversion\n",
       "1  Start  C1  NaN  NaN         NaN\n",
       "2  Start  C2   C3  NaN         NaN"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Transform the Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Reshape the journey dataset to get a new, two-column dataset that contains each \n",
    "# transition. That is, each row in the new dataset will contain: (T, T+1) \n",
    "transitions = list()\n",
    "for n in range(0, len(df.columns)-1):\n",
    "    df_transition = pd.DataFrame({\n",
    "        't': df.iloc[:,n], \n",
    "        't_plus_1': df.iloc[:,n+1]\n",
    "    })\n",
    "    transitions.append(df_transition)\n",
    "df_transitions = pd.concat(transitions)\n",
    "\n",
    "# We can drop all rows with NaN values in state T (the starting state).\n",
    "# These represent journey's that have already been completed.\n",
    "df_transitions.dropna(subset=['t'], inplace=True)\n",
    "\n",
    "# Let's replace the NaN's in the T+1 columns with a value indicating that the user bailed.\n",
    "# This will avoid problems later when we do groupby functions on the dataframe.\n",
    "df_transitions.fillna('Exit', inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>t</th>\n",
       "      <th>t_plus_1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Start</td>\n",
       "      <td>C1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Start</td>\n",
       "      <td>C1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Start</td>\n",
       "      <td>C2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>C1</td>\n",
       "      <td>C2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>C1</td>\n",
       "      <td>Exit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>C2</td>\n",
       "      <td>C3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>C2</td>\n",
       "      <td>C3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>C3</td>\n",
       "      <td>Exit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>C3</td>\n",
       "      <td>Conversion</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       t    t_plus_1\n",
       "0  Start          C1\n",
       "1  Start          C1\n",
       "2  Start          C2\n",
       "0     C1          C2\n",
       "1     C1        Exit\n",
       "2     C2          C3\n",
       "0     C2          C3\n",
       "2     C3        Exit\n",
       "0     C3  Conversion"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_transitions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Create a separate lookup table with row counts for each starting state, t, in the \n",
    "# transitions table. We'll use this to calculate probabilities later, and it will  \n",
    "# help us avoid the slow performing df.apply() method.\n",
    "df_initial_state_counts = df_transitions.groupby(by=['t'], as_index=False).count()\n",
    "df_initial_state_counts.rename(columns={'t_plus_1':'count_of_t'}, inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>t</th>\n",
       "      <th>count_of_t</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>C1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>C2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>C3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Start</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       t  count_of_t\n",
       "0     C1           2\n",
       "1     C2           2\n",
       "2     C3           2\n",
       "3  Start           3"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_initial_state_counts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Calculate Transition Probabilities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Join the lookup table to the transitions table, to pull in the counts for each starting state, t.\n",
    "df_transitions = pd.merge(df_transitions, df_initial_state_counts, on='t', how='inner')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>t</th>\n",
       "      <th>t_plus_1</th>\n",
       "      <th>count_of_t</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Start</td>\n",
       "      <td>C1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Start</td>\n",
       "      <td>C1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Start</td>\n",
       "      <td>C2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>C1</td>\n",
       "      <td>C2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>C1</td>\n",
       "      <td>Exit</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>C2</td>\n",
       "      <td>C3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>C2</td>\n",
       "      <td>C3</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>C3</td>\n",
       "      <td>Exit</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>C3</td>\n",
       "      <td>Conversion</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       t    t_plus_1  count_of_t\n",
       "0  Start          C1           3\n",
       "1  Start          C1           3\n",
       "2  Start          C2           3\n",
       "3     C1          C2           2\n",
       "4     C1        Exit           2\n",
       "5     C2          C3           2\n",
       "6     C2          C3           2\n",
       "7     C3        Exit           2\n",
       "8     C3  Conversion           2"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_transitions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Calculate the individual probability for each transition instance\n",
    "df_transitions['probability'] = 1/df_transitions['count_of_t']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>t</th>\n",
       "      <th>t_plus_1</th>\n",
       "      <th>count_of_t</th>\n",
       "      <th>probability</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Start</td>\n",
       "      <td>C1</td>\n",
       "      <td>3</td>\n",
       "      <td>0.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Start</td>\n",
       "      <td>C1</td>\n",
       "      <td>3</td>\n",
       "      <td>0.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Start</td>\n",
       "      <td>C2</td>\n",
       "      <td>3</td>\n",
       "      <td>0.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>C1</td>\n",
       "      <td>C2</td>\n",
       "      <td>2</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>C1</td>\n",
       "      <td>Exit</td>\n",
       "      <td>2</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>C2</td>\n",
       "      <td>C3</td>\n",
       "      <td>2</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>C2</td>\n",
       "      <td>C3</td>\n",
       "      <td>2</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>C3</td>\n",
       "      <td>Exit</td>\n",
       "      <td>2</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>C3</td>\n",
       "      <td>Conversion</td>\n",
       "      <td>2</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       t    t_plus_1  count_of_t  probability\n",
       "0  Start          C1           3     0.333333\n",
       "1  Start          C1           3     0.333333\n",
       "2  Start          C2           3     0.333333\n",
       "3     C1          C2           2     0.500000\n",
       "4     C1        Exit           2     0.500000\n",
       "5     C2          C3           2     0.500000\n",
       "6     C2          C3           2     0.500000\n",
       "7     C3        Exit           2     0.500000\n",
       "8     C3  Conversion           2     0.500000"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_transitions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Calculate the total probability of transitioning from one state to another\n",
    "df_transition_prob = df_transitions.groupby(by=['t', 't_plus_1'], as_index=False).sum()\n",
    "df_transition_prob.drop(['count_of_t'], axis=1, inplace=True)  # We don't need this column anymore"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>t</th>\n",
       "      <th>t_plus_1</th>\n",
       "      <th>probability</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>C1</td>\n",
       "      <td>C2</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>C1</td>\n",
       "      <td>Exit</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>C2</td>\n",
       "      <td>C3</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>C3</td>\n",
       "      <td>Conversion</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>C3</td>\n",
       "      <td>Exit</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Start</td>\n",
       "      <td>C1</td>\n",
       "      <td>0.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Start</td>\n",
       "      <td>C2</td>\n",
       "      <td>0.333333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       t    t_plus_1  probability\n",
       "0     C1          C2     0.500000\n",
       "1     C1        Exit     0.500000\n",
       "2     C2          C3     1.000000\n",
       "3     C3  Conversion     0.500000\n",
       "4     C3        Exit     0.500000\n",
       "5  Start          C1     0.666667\n",
       "6  Start          C2     0.333333"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_transition_prob"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>t</th>\n",
       "      <th>probability</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>C1</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>C2</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>C3</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Start</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       t  probability\n",
       "0     C1          1.0\n",
       "1     C2          1.0\n",
       "2     C3          1.0\n",
       "3  Start          1.0"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Double-check to make sure the total probability for each starting state, t, equals 1.0 (i.e. 100%)\n",
    "df_test = df_transition_prob.groupby(by='t', as_index=False).sum()\n",
    "df_test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " \n",
      "The probability calculation test passed!!! :)\n",
      " \n"
     ]
    }
   ],
   "source": [
    "print(' ')\n",
    "if df_test['probability'].sum() != len(df_test):\n",
    "    print('[ERROR]: The probability calculation test failed. :(')\n",
    "else:\n",
    "    print('The probability calculation test passed!!! :)')\n",
    "print(' ')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Calculate Total Conversion Probability"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def print_node(node, debug=False):\n",
    "    # Print each node as it's processed. For debugging purposes.\n",
    "    \n",
    "    if not debug:\n",
    "        return\n",
    "    \n",
    "    if node['t_plus_1'] in ['Exit', 'Conversion']:\n",
    "        node_type = 'Leaf'\n",
    "    else:\n",
    "        node_type = 'Parent'\n",
    "    \n",
    "    print('%s > %s' % (node['t'], node['t_plus_1']))\n",
    "    print('Type: %s' % node_type)\n",
    "    print('Prob: %0.2f' % node['probability'])\n",
    "    print('----------------------------')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "calculated_node_probabilities = dict()\n",
    "\n",
    "def calc_conversion_probability(starting_state, df_transitions, cum_probability, calculated_nodes, debug=True):\n",
    "    # Calculates the cumulative probability of reaching a conversion, given a starting state.\n",
    "    # Assumes the transition dataframe represents a Directed Acyclic Graph (DAG)\n",
    "        \n",
    "    # Get the transition probabilities for the starting state we're evaluating\n",
    "    df_nodes = df_transitions[df_transitions['t'] == starting_state]\n",
    "\n",
    "    \n",
    "    # Loop through the starting states and either return the probability for \n",
    "    # a leaf node, or recurse to keep following the tree.\n",
    "    \n",
    "    node_conversion_probability = 0\n",
    "    \n",
    "    child_node_proabilities = []\n",
    "    \n",
    "    for index, row in df_nodes.iterrows():\n",
    "        \n",
    "        # These are leaf nodes: either an exit or conversion\n",
    "        if row['t_plus_1'] == 'Exit':\n",
    "            print_node(row, debug)\n",
    "            child_node_proabilities.append(0)\n",
    "        \n",
    "        elif row['t_plus_1'] == 'Conversion':\n",
    "            print_node(row, debug)\n",
    "            child_node_proabilities.append(row['probability'])\n",
    "        \n",
    "        # This is a parent node: Keep following the rabbit hole\n",
    "        else:\n",
    "            \n",
    "            # Have we cached the total probability for this node???\n",
    "            if row['t_plus_1'] in calculated_nodes:\n",
    "                if debug:\n",
    "                    print('Cache Hit for %s! Cum probability from child: %0.2f' % (row['t_plus_1'], calculated_nodes[row['t_plus_1']]))\n",
    "                child_probability = calculated_nodes[row['t_plus_1']]\n",
    "            \n",
    "            # No cached value found. We'll walk through the tree to calculated the value.\n",
    "            else:\n",
    "                # Recursive call\n",
    "                child_probability = calc_conversion_probability(row['t_plus_1'], \n",
    "                                                                df_transitions, \n",
    "                                                                cum_probability + row['probability'],\n",
    "                                                                calculated_nodes, \n",
    "                                                                debug)\n",
    "                node_conversion_prob = child_probability * row['probability']\n",
    "                \n",
    "            print_node(row, debug)\n",
    "            child_node_proabilities.append(node_conversion_prob)\n",
    "                    \n",
    "            if debug:\n",
    "                print('%s > %s' % (row['t'], row['t_plus_1']))\n",
    "                print('Cum Prob from Child : %0.2f' % child_probability)\n",
    "                print('Prob to Child Node  : %0.2f' % row['probability'])\n",
    "                print('Node Conv Proability: %0.2f' % node_conversion_prob)\n",
    "                print('----------------------------')\n",
    "    \n",
    "    total_node_probability = sum(child_node_proabilities)\n",
    "    if debug:\n",
    "        print('Node Conversion Probability for %s: %0.2f' % (starting_state, total_node_probability))\n",
    "        print('----------------------------')\n",
    "    \n",
    "    # We'll cache the calculated total probability for the node, so we don't have to calculate it again.\n",
    "    calculated_node_probabilities[starting_state] = total_node_probability\n",
    "        \n",
    "    return total_node_probability\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "====== START DEBUG PRINT ======\n",
      "C3 > Conversion\n",
      "Type: Leaf\n",
      "Prob: 0.50\n",
      "----------------------------\n",
      "C3 > Exit\n",
      "Type: Leaf\n",
      "Prob: 0.50\n",
      "----------------------------\n",
      "Node Conversion Probability for C3: 0.50\n",
      "----------------------------\n",
      "C2 > C3\n",
      "Type: Parent\n",
      "Prob: 1.00\n",
      "----------------------------\n",
      "C2 > C3\n",
      "Cum Prob from Child : 0.50\n",
      "Prob to Child Node  : 1.00\n",
      "Node Conv Proability: 0.50\n",
      "----------------------------\n",
      "Node Conversion Probability for C2: 0.50\n",
      "----------------------------\n",
      "C1 > C2\n",
      "Type: Parent\n",
      "Prob: 0.50\n",
      "----------------------------\n",
      "C1 > C2\n",
      "Cum Prob from Child : 0.50\n",
      "Prob to Child Node  : 0.50\n",
      "Node Conv Proability: 0.25\n",
      "----------------------------\n",
      "C1 > Exit\n",
      "Type: Leaf\n",
      "Prob: 0.50\n",
      "----------------------------\n",
      "Node Conversion Probability for C1: 0.25\n",
      "----------------------------\n",
      "Start > C1\n",
      "Type: Parent\n",
      "Prob: 0.67\n",
      "----------------------------\n",
      "Start > C1\n",
      "Cum Prob from Child : 0.25\n",
      "Prob to Child Node  : 0.67\n",
      "Node Conv Proability: 0.17\n",
      "----------------------------\n",
      "Cache Hit for C2! Cum probability from child: 0.50\n",
      "Start > C2\n",
      "Type: Parent\n",
      "Prob: 0.33\n",
      "----------------------------\n",
      "Start > C2\n",
      "Cum Prob from Child : 0.50\n",
      "Prob to Child Node  : 0.33\n",
      "Node Conv Proability: 0.17\n",
      "----------------------------\n",
      "Node Conversion Probability for Start: 0.33\n",
      "----------------------------\n",
      "====== END DEBUG PRINT ======\n",
      " \n",
      "Total Conversion Probability from Start: 0.33\n"
     ]
    }
   ],
   "source": [
    "starting_node = 'Start'\n",
    "print('====== START DEBUG PRINT ======')\n",
    "total_probability = calc_conversion_probability(starting_node, df_transition_prob, 0, calculated_node_probabilities)\n",
    "print('====== END DEBUG PRINT ======')\n",
    "print(' ')\n",
    "print('Total Conversion Probability from %s: %0.2f' % (starting_node, total_probability))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Calculate Removal Effect"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>t</th>\n",
       "      <th>t_plus_1</th>\n",
       "      <th>probability</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>C1</td>\n",
       "      <td>C2</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>C1</td>\n",
       "      <td>Exit</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>C2</td>\n",
       "      <td>C3</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>C3</td>\n",
       "      <td>Conversion</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>C3</td>\n",
       "      <td>Exit</td>\n",
       "      <td>0.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Start</td>\n",
       "      <td>C1</td>\n",
       "      <td>0.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Start</td>\n",
       "      <td>C2</td>\n",
       "      <td>0.333333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       t    t_plus_1  probability\n",
       "0     C1          C2     0.500000\n",
       "1     C1        Exit     0.500000\n",
       "2     C2          C3     1.000000\n",
       "3     C3  Conversion     0.500000\n",
       "4     C3        Exit     0.500000\n",
       "5  Start          C1     0.666667\n",
       "6  Start          C2     0.333333"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Let's look at our transition probability table again. We'll tweak this to calculate the \n",
    "# removal effect.\n",
    "df_transition_prob"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "unique_channels = df_transition_prob[df_transition_prob['t'] != 'Start']['t'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.options.mode.chained_assignment = None  # I know, this makes me a very bad person.\n",
    "removal_effects = dict()\n",
    "\n",
    "# Remove each channel and calculate the impact\n",
    "for channel in unique_channels:\n",
    "    \n",
    "    # Remove the channel from the transition probability matrix\n",
    "    df_reduced_graph = df_transition_prob[df_transition_prob['t'] != channel]\n",
    "    df_reduced_graph.loc[df_reduced_graph['t_plus_1']==channel, 't_plus_1'] = 'Exit'\n",
    "    \n",
    "    # Recalculate the total conversion probability\n",
    "    calculated_node_probabilities = dict()\n",
    "    new_total_probability = calc_conversion_probability('Start', \n",
    "                                                        df_reduced_graph, \n",
    "                                                        0, \n",
    "                                                        calculated_node_probabilities, \n",
    "                                                        debug=False)\n",
    "    \n",
    "    # Calculate the difference in conversion probability\n",
    "    removal_effect = (total_probability - new_total_probability)/total_probability\n",
    "    removal_effects[channel] = removal_effect\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Results: Which channels have the greatest impact?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Removal effect by channel:\n",
      "C1: 0.50\n",
      "C2: 1.00\n",
      "C3: 1.00\n"
     ]
    }
   ],
   "source": [
    "print('Removal effect by channel:')\n",
    "for key, value in removal_effects.items(): \n",
    "    print('%s: %0.2f' % (key, value))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}