{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "dW9C26GZCIJc" }, "source": [ "# The overview of the basic approaches to solving the Uplift Modeling problem\n", "\n", "
\n", "
\n", " \n", " \n", " \n", "
\n", " SCIKIT-UPLIFT REPO | \n", " SCIKIT-UPLIFT DOCS | \n", " USER GUIDE\n", "
\n", " RUSSIAN VERSION\n", "
" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "9Mz7V_YaCIKC" }, "source": [ "## Content\n", "\n", "* [Introduction](#Introduction)\n", "* [1. Single model approaches](#1.-Single-model-approaches)\n", " * [1.1 Single model](#1.1-Single-model-with-treatment-as-feature)\n", " * [1.2 Class Transformation](#1.2-Class-Transformation)\n", "* [2. Approaches with two models](#2.-Approaches-with-two-models)\n", " * [2.1 Two independent models](#2.1-Two-independent-models)\n", " * [2.2 Two dependent models](#2.2-Two-dependent-models)\n", "* [Conclusion](#Conclusion)\n", "\n", "## Introduction\n", "\n", "Before proceeding to the discussion of uplift modeling, let's imagine some situation:\n", "\n", "A customer comes to you with a certain problem: it is necessary to advertise a popular product using the sms.\n", "You know that the product is quite popular, and it is often installed by the customers without communication, that the usual binary classification will find the same customers, and the cost of communication is critical for us...\n", "\n", "And then you begin to understand that the product is already popular, that the product is often installed by customers without communication, that the usual binary classification will find many such customers, and the cost of communication is critical for us...\n", "\n", "Historically, according to the impact of communication, marketers divide all customers into 4 categories:\n", "\n", "

\n", " \n", "

\n", "\n", "- **Do-Not-Disturbs** *(a.k.a. Sleeping-dogs)* have a strong negative response to a marketing communication. They are going to purchase if *NOT* treated and will *NOT* purchase *IF* treated. It is not only a wasted marketing budget but also a negative impact. For instance, customers targeted could result in rejecting current products or services. In terms of math: $W_i = 1, Y_i = 0$ or $W_i = 0, Y_i = 1$.\n", "- **Lost Causes** will *NOT* purchase the product *NO MATTER* they are contacted or not. The marketing budget in this case is also wasted because it has no effect. In terms of math: $W_i = 1, Y_i = 0$ or $W_i = 0, Y_i = 0$.\n", "- **Sure Things** will purchase *ANYWAY* no matter they are contacted or not. There is no motivation to spend the budget because it also has no effect. In terms of math: $W_i = 1, Y_i = 1$ or $W_i = 0, Y_i = 1$.\n", "- **Persuadables** will always respond *POSITIVE* to the marketing communication. They is going to purchase *ONLY* if contacted (or sometimes they purchase *MORE* or *EARLIER* only if contacted). This customer's type should be the only target for the marketing campaign. In terms of math: $W_i = 0, Y_i = 0$ or $W_i = 1, Y_i = 1$.\n", "\n", "\n", "Because we can't communicate and not communicate with the customer at the same time, we will never be able to observe exactly which type a particular customer belongs to.\n", "\n", "Depends on the product characteristics and the customer base structure some types may be absent. In addition, a customer response depends heavily on various characteristics of the campaign, such as a communication channel or a type and a size of the marketing offer. To maximize profit, these parameters should be selected.\n", "\n", "Thus, when predicting uplift score and selecting a segment by the highest score, we are trying to find the only one type: **persuadables**.\n", "\n", "Thus, in this task, we donâ€™t want to predict the probability of performing a target action, but to focus the advertising budget on the customers who will perform the target action only when we interact. In other words, we want to evaluate two conditional probabilities separately for each client:\n", "\n", "\n", "* Performing a targeted action when we influence the client. \n", " We will refer such clients to the **test group (aka treatment)**: $P^T = P(Y=1 | W = 1)$,\n", "* Performing a targeted action without affecting the client. \n", " We will refer such clients to the **control group (aka control)**: $P^C = P(Y=1 | W = 0)$,\n", "\n", "where $Y$ is the binary flag for executing the target action, and $W$ is the binary flag for communication (in English literature, _treatment_)\n", "\n", "The very same cause-and-effect effect is called **uplift** and is estimated as the difference between these two probabilities:\n", "\n", "$$uplift = P^T - P^C = P(Y = 1 | W = 1) - P(Y = 1 | W = 0)$$\n", "\n", "Predicting uplift is a cause-and-effect inference task. The point is that you need to evaluate the difference between two events that are mutually exclusive for a particular client (either we interact with a person, or not; you can't perform two of these actions at the same time). This is why additional requirements for source data are required for building uplift models.\n", "\n", "To get a training sample for the uplift simulation, you need to conduct an experiment: \n", "1. Randomly split a representative part of the client base into a test and control group\n", "2. Communicate with the test group\n", "\n", "The data obtained as part of the design of such a pilot will allow us to build an uplift forecasting model in the future. It is also worth noting that the experiment should be as similar as possible to the campaign, which will be launched later on a larger scale. The only difference between the experiment and the campaign should be the fact that during the pilot, we choose random clients for interaction, and during the campaign - based on the predicted value of the Uplift. If the campaign that is eventually launched differs significantly from the experiment that is used to collect data about the performance of targeted actions by clients, then the model that is built may be less reliable and accurate.\n", "\n", "So, the approaches to predicting uplift are aimed at assessing the net effect of marketing campaigns on customers.\n", "\n", "All classical approaches to uplift modeling can be divided into two classes:\n", "1. Approaches with the same model\n", "2. Approaches using two models\n", "\n", "Let's download [RetailHero.ai contest data](https://retailhero.ai/c/uplift_modeling/overview):" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-05-30T22:29:36.009546Z", "start_time": "2020-05-30T22:29:36.005593Z" }, "colab": { "base_uri": "https://localhost:8080/", "height": 700 }, "colab_type": "code", "id": "DKvg0hDcCfvi", "outputId": "c1864df4-2d46-4817-f865-2f3d46eb0c21" }, "outputs": [], "source": [ "import urllib.request\n", "\n", "url = 'https://drive.google.com/u/0/uc?id=1fkxNmihuS15kk0PP0QcphL_Z3_z8LLeb&export=download'\n", "urllib.request.urlretrieve(url, '/content/retail_hero.zip')\n", "\n", "!unzip /content/retail_hero.zip\n", "!pip install scikit-uplift catboost==0.22 -U" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "76e99aut_1nH" }, "source": [ "Now let's preprocess it a bit:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2020-05-30T22:29:39.095040Z", "start_time": "2020-05-30T22:29:36.013799Z" }, "colab": {}, "colab_type": "code", "id": "wojgYP76CIKH" }, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import pandas as pd; pd.set_option('display.max_columns', None)\n", "from sklearn.model_selection import train_test_split\n", "\n", "\n", "# reading data\n", "df_clients = pd.read_csv('/content/uplift_data/clients.csv', index_col='client_id')\n", "df_train = pd.read_csv('/content/uplift_data/uplift_train.csv', index_col='client_id')\n", "df_test = pd.read_csv('/content/uplift_data/uplift_test.csv', index_col='client_id')\n", "\n", "# extracting features\n", "df_features = df_clients.copy()\n", "df_features['first_issue_time'] = \\\n", " (pd.to_datetime(df_features['first_issue_date'])\n", " - pd.Timestamp('1970-01-01')) // pd.Timedelta('1s')\n", "df_features['first_redeem_time'] = \\\n", " (pd.to_datetime(df_features['first_redeem_date'])\n", " - pd.Timestamp('1970-01-01')) // pd.Timedelta('1s')\n", "df_features['issue_redeem_delay'] = df_features['first_redeem_time'] \\\n", " - df_features['first_issue_time']\n", "df_features = df_features.drop(['first_issue_date', 'first_redeem_date'], axis=1)\n", "\n", "indices_train = df_train.index\n", "indices_test = df_test.index\n", "indices_learn, indices_valid = train_test_split(df_train.index, test_size=0.3, random_state=123)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "LB42pR6iCIK9" }, "source": [ "For convenience, we will declare some variables:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2020-05-30T22:29:40.203839Z", "start_time": "2020-05-30T22:29:39.097441Z" }, "colab": {}, "colab_type": "code", "id": "_HyKmGQ_CILw" }, "outputs": [], "source": [ "X_train = df_features.loc[indices_learn, :]\n", "y_train = df_train.loc[indices_learn, 'target']\n", "treat_train = df_train.loc[indices_learn, 'treatment_flg']\n", "\n", "X_val = df_features.loc[indices_valid, :]\n", "y_val = df_train.loc[indices_valid, 'target']\n", "treat_val = df_train.loc[indices_valid, 'treatment_flg']\n", "\n", "X_train_full = df_features.loc[indices_train, :]\n", "y_train_full = df_train.loc[:, 'target']\n", "treat_train_full = df_train.loc[:, 'treatment_flg']\n", "\n", "X_test = df_features.loc[indices_test, :]\n", "\n", "cat_features = ['gender']\n", "\n", "models_results = {\n", " 'approach': [],\n", " 'uplift@30%': []\n", "}" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "olRc5k9iCIMP" }, "source": [ "## 1. Single model approaches\n", "\n", "### 1.1 Single model with treatment as feature\n", "\n", "The most intuitive and simple uplift modeling technique. A training set consists of two groups: treatment samples and control samples. There is also a binary treatment flag added as a feature to the training set. After the model is trained, at the scoring time it is going to be applied twice:\n", "with the treatment flag equals 1 and with the treatment flag equals 0. Subtracting these model's outcomes for each test sample, we will get an estimate of the uplift.\n", "\n", "

\n", " \n", "