{ "cells": [ { "cell_type": "markdown", "id": "8b7270fa", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "id": "81492a81", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# code for loading the format for the notebook\n", "import os\n", "\n", "# path : store the current path to convert back to it later\n", "path = os.getcwd()\n", "os.chdir(os.path.join('..', 'notebook_format'))\n", "\n", "from formats import load_style\n", "load_style(plot_style=False)" ] }, { "cell_type": "code", "execution_count": 2, "id": "1f56ba35", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Author: Ethen\n", "\n", "Python implementation: CPython\n", "Python version : 3.7.11\n", "IPython version : 7.27.0\n", "\n", "ortools: 9.0.9048\n", "\n" ] } ], "source": [ "os.chdir(path)\n", "\n", "# 1. magic to print version\n", "# 2. magic so that the notebook will reload external python modules\n", "%load_ext watermark\n", "%load_ext autoreload\n", "%autoreload 2\n", "\n", "from collections import namedtuple\n", "from ortools.linear_solver import pywraplp\n", "\n", "%watermark -a 'Ethen' -d -t -v -p ortools" ] }, { "cell_type": "markdown", "id": "e1078c7e", "metadata": {}, "source": [ "# Operation Research Quick Intro Via Ortools" ] }, { "cell_type": "markdown", "id": "20915d8d", "metadata": {}, "source": [ "The way to think about operation research, or optimization problem is we want to maximizie/minimize some objective, while being subjected to certain constraints.\n", "\n", "For example, say we are deciding whether to buy ice cream or boba tea for dessert. Each type of food has an associated value, and cost, while we have a certain budget that we don't wish to exceed.\n", "\n", "\\begin{align}\n", "& \\text{maximize}\n", "&& \\text{value}_{\\text{ice_cream}} \\cdot \\text{ice_cream} + \\text{value}_{\\text{boba}} \\cdot \\text{boba} \\nonumber \\\\\n", "& \\text{subject to}\n", "&& \\text{cost}_{\\text{ice_cream}} \\cdot \\text{ice_cream} + \\text{cost}_{\\text{boba}} \\cdot \\text{boba} \\leq \\text{budget}\n", "\\end{align}\n", "\n", "Say we are able to replace the value, cost, and budget part with actual numbers (in practice, assigning actual numbers to each of these coefficients is often times core pieces of the work).\n", "\n", "\\begin{align}\n", "& \\text{maximize}\n", "&& 3 \\cdot \\text{ice_cream} + 2 \\cdot \\text{boba} \\nonumber \\\\\n", "& \\text{subject to}\n", "&& 2 \\cdot \\text{ice_cream} + 1 \\cdot \\text{boba} \\leq 1\n", "\\end{align}\n", "\n", "Given this toy problem, we can eye ball the solution, and see that we should use our limited budget to buy a boba tea for dessert. Operation research, a.k.a optimization techniques helps us algorithmically find solutions for these types of problems at a much larger scale.\n", "\n", "The following section, uses `ortools` library to solve this problem programmatically." ] }, { "cell_type": "code", "execution_count": 3, "id": "91764b4d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[DessertInfo(name='ice_cream', value=3, cost=2),\n", " DessertInfo(name='boba', value=2, cost=1)]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "budget = 1\n", "DessertInfo = namedtuple('DessertInfo', ['name', 'value', 'cost'])\n", "dessert_infos = [\n", " DessertInfo('ice_cream', 3, 2),\n", " DessertInfo('boba', 2, 1),\n", "]\n", "num_desserts = len(dessert_infos)\n", "dessert_infos" ] }, { "cell_type": "code", "execution_count": 4, "id": "e007973d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimal Value: = 2.0\n", "ice_cream 0.0\n", "boba 1.0\n" ] } ], "source": [ "# creates solver\n", "solver = pywraplp.Solver.CreateSolver('GLOP')\n", "\n", "# creates variables\n", "variables = [solver.NumVar(0, solver.infinity(), dessert_info.name) for dessert_info in dessert_infos]\n", "\n", "# define constraints\n", "constraint_coefficients = [dessert_info.cost for dessert_info in dessert_infos]\n", "constraint = [constraint_coefficients[i] * variables[i] for i in range(num_desserts)]\n", "solver.Add(solver.Sum(constraint) <= budget)\n", "\n", "# define objective\n", "objective_coefficients = [dessert_info.value for dessert_info in dessert_infos]\n", "objective = constraint = [objective_coefficients[i] * variables[i] for i in range(num_desserts)]\n", "solver.Maximize(solver.Sum(objective))\n", "\n", "# solve\n", "status = solver.Solve()\n", "\n", "# extract optimal/feasible value\n", "if status == pywraplp.Solver.OPTIMAL or status == pywraplp.Solver.FEASIBLE:\n", " optimal_value = solver.Objective().Value()\n", " print(f'Optimal Value: = {optimal_value}')\n", " for i in range(num_desserts):\n", " print(variables[i].name(), variables[i].solution_value())" ] }, { "cell_type": "markdown", "id": "f6dcc65f", "metadata": {}, "source": [ "A couple of important things to note:\n", "\n", "- We are solving a Linear Programming problem, where we are computing the best solution to a given problem modeled as a series of linear relationships.\n", "- In this article, we won't be diving into the algorithms/solvers that are the workhorse behind the scenes that's finding the solution for us, and focus on how to frame the optimization problem.\n", "- We didn't explicitly specify this in our optimization formula, but notice the definition of `NumVar` specifies that our variables can take on numeric solutions. Often times, our problem might require some of the variables to be integers, these are called Mixed Integer Programming. e.g. In our example, we probably can't buy 1.5 portion of boba. In these cases, we can specify our variables to be `IntVar`.\n", "- There're other open sourced frameworks other than `ortools` out there, feel free to pick and choose based on preferences or speed. The exact API might be different, but the main idea revolves around defining the objective, defining the variables, adding the constraints, solving it and extracting the optimal/feasible solution." ] }, { "cell_type": "markdown", "id": "af86e980", "metadata": {}, "source": [ "## Assignment Problem" ] }, { "cell_type": "markdown", "id": "690e867d", "metadata": {}, "source": [ "Continuing with our discussions around Mixed Integer Programming, a closely related problem is the assignment problem, where our variables involves boolean decisions of 0 and 1 values.\n", "\n", "We'll use the examples from this blog post, [Blog: Towards optimal personalization: synthesisizing machine learning and operations research](https://www.ethanrosenthal.com/2016/08/30/towards-optimal-personalization/).\n", "\n", "Say we are working in the marketing team, and we have different types of churn prevention channel, each having different prices, while different users/customers' retention rate is different for each channel. Our constraint is not spending above our monthly marketing budget, and the goal is to maxmize the total number of retained customers.\n", "\n", "\n", "\\begin{align}\n", "\\text{maximize}\n", "& \\sum_{u, c} R_{u, c} A_{u, c} \\nonumber \\\\\n", "\\text{subject to}\n", "& \\sum_{u, c} P_{u, c} A_{u, c} \\leq B \\\\\n", "& \\sum_{c} A_{u, c} = 1, \\forall u \\in U \\\\\n", "& a_{u, c} \\in \\{0, 1\\}\n", "\\end{align}\n", "\n", "Where:\n", "\n", "- $U$: is the set of all users.\n", "- $C$: is the set of all channels.\n", "- $R_{u, c}$: is the rentention probability if we were to notify the user, $u$, using the channel $c$.\n", "- $A_{u, c}$: is the assignment boolean decision variable, i.e. it takes on the value of 1 if we decided to reach out to user $u$ with channel $c$, 0 otherwise.\n", "- $P_{u, c}$: is the price/cost if we were to notify the user, $u$, using the channel $c$.\n", "- We have a constraint saying each customer can only receive the retention message via one channel, to prevent bombarding them.\n", "- As well as a constraint saying our cost shouldn't exceed our monthly budget $B$.\n", "\n", "Let's say we have 4 channels: email (0.25), push notification (0.3), text message (0.85), and phone call (5.0). Number in parenthesis indicates the cost/price.\n", "As for the retention probability, we will be using some randomly generated numbers, but imagine in real world scenarios where this can come from aggregated historical information, or even generated by some machine learning models." ] }, { "cell_type": "code", "execution_count": 5, "id": "5699cc9d", "metadata": {}, "outputs": [], "source": [ "budget = 1000\n", "price = [25, 30, 85, 250]\n", "\n", "# rentention probability for each customer and channel pair\n", "retention_prob = [\n", " [0.02, 0.27, 0.17, 0.87],\n", " [0.14, 0.21, 0.28, 0.014],\n", " [0.13, 0.003, 0.016, 0.64],\n", " [0.14, 0.04, 0.14, 0.26],\n", " [0.04, 0.24, 0.11, 0.31],\n", "]\n", "num_users = len(retention_prob)\n", "num_channels = len(retention_prob[0])" ] }, { "cell_type": "code", "execution_count": 6, "id": "cd035dbb", "metadata": {}, "outputs": [], "source": [ "# creates the solver for the mixed integer programming\n", "solver = pywraplp.Solver.CreateSolver('SCIP')\n", "\n", "# variable: assignment problem, creating a dictionary of binary variables\n", "variables = {}\n", "for i in range(num_users):\n", " for j in range(num_channels):\n", " variables[i, j] = solver.IntVar(0, 1, f'prob{i}_{j}')" ] }, { "cell_type": "code", "execution_count": 7, "id": "aa12b233", "metadata": {}, "outputs": [ { "data": { "text/plain": [ " >" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# constraint: each user is assigned to at most 1 channel.\n", "for i in range(num_users):\n", " solver.Add(solver.Sum([variables[i, j] for j in range(num_channels)]) <= 1)\n", "\n", "# constraint: total cost should not exceed budget\n", "constraints = []\n", "for j in range(num_channels):\n", " for i in range(num_users):\n", " constraint = price[j] * variables[i, j]\n", " constraints.append(constraint)\n", "\n", "solver.Add(solver.Sum(constraints) <= budget) " ] }, { "cell_type": "code", "execution_count": 8, "id": "a8923640", "metadata": {}, "outputs": [], "source": [ "# objective\n", "objective_terms = []\n", "for i in range(num_users):\n", " for j in range(num_channels):\n", " objective_terms.append(retention_prob[i][j] * variables[i, j])\n", "\n", "solver.Maximize(solver.Sum(objective_terms))" ] }, { "cell_type": "code", "execution_count": 9, "id": "ce431e66", "metadata": {}, "outputs": [], "source": [ "# invokes the solver\n", "status = solver.Solve()" ] }, { "cell_type": "code", "execution_count": 10, "id": "a5d6ed80", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimal Value: = 2.29\n", "User 0 assigned to Channel 3, Cost = 250\n", "User 1 assigned to Channel 2, Cost = 85\n", "User 2 assigned to Channel 3, Cost = 250\n", "User 3 assigned to Channel 3, Cost = 250\n", "User 4 assigned to Channel 1, Cost = 30\n" ] } ], "source": [ "if status == pywraplp.Solver.OPTIMAL or status == pywraplp.Solver.FEASIBLE:\n", " optimal_value = solver.Objective().Value()\n", " print(f'Optimal Value: = {optimal_value}')\n", " for i in range(num_users):\n", " for j in range(num_channels):\n", " # check indicator variable's value, with tolerance for floating point arithmetic\n", " if variables[i, j].solution_value() > 0.5:\n", " print(f'User {i} assigned to Channel {j}, Cost = {price[j]}')" ] }, { "cell_type": "markdown", "id": "959853b5", "metadata": {}, "source": [ "In this article, we took a sneak peak into some problems that can benefit from leveraging optimization. The problems that we deal with in real world settings can be a lot more complicated than the examples seen here, but hopefully, this gives you the idea that whenever we see a problem that involves maximizing some objectives given some constraint, we have a tool at hand that we can turn to." ] }, { "cell_type": "markdown", "id": "f63d94f5", "metadata": {}, "source": [ "# Reference" ] }, { "cell_type": "markdown", "id": "ae8659c4", "metadata": {}, "source": [ "- [Blog: I'm all about ML, but let's talk about OR](https://www.ethanrosenthal.com/2016/07/20/lets-talk-or/)\n", "- [Blog: Towards optimal personalization: synthesisizing machine learning and operations research](https://www.ethanrosenthal.com/2016/08/30/towards-optimal-personalization/)\n", "- [Blog: Add Constrained Optimization To Your Toolbelt](https://multithreaded.stitchfix.com/blog/2018/06/21/constrained-optimization/)\n", "- [Or Tools Documentation: Solving an Assignment Problem](https://developers.google.com/optimization/assignment/assignment_example)\n", "- [Notes: Transformations in Integer Programming](https://ocw.mit.edu/courses/sloan-school-of-management/15-053-optimization-methods-in-management-science-spring-2013/tutorials/MIT15_053S13_tut09.pdf)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.11" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "227.7px" }, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 5 }