{ "metadata": { "name": "", "signature": "sha256:28e0041450ebe78b81abd93300257d198bb1ef15a312f5c7c66b17366e495c3f" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": { "internals": { "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "source": [ "# Multi-armed Bandits\n", "\n", "### By Michael Els\n", "\n", "\n", "### 11/06/2014\n", "\n", "\n", "\n", "\n", "\n", "![logo](maxpoint.png)\n" ] }, { "cell_type": "markdown", "metadata": { "internals": { "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "source": [ "## Focus of this talk\n", "\n", "1. Not Math\n", "2. Not Programming\n", "3. Problem Solving\n", "4. Under-utilized methods\n", "5. Showcase Ipython Notebooks" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 4, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "source": [ "## Background \u2013 What we do as a company\n", "\n", "1. Programmatic Advertising\n", "2. Custom ad campaigns for agencies and large companies\n", "3. Targeting Platform\n", " - Hyper local geo targeting\n", " - Cookie and cookieless user targeting\n", " - Contextual and Performance based media targeting" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 4, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "source": [ "## Background \u2013 What does our system do\n", "\n", "- Real Time Bidding with 12B-15B ad requests\n", "- We want to serve several hundred million ads\n", "- Data intelligence paltform\n", "- Offline and online models to create scored lists\n", " - Users\n", " - Media\n", " - Geography\n", " - Time\n", " - Auction theory\n", "- Score every ad request\n", "- Bidders execute on scores in under 50ms roundtrip\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 4, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "source": [ "## Background \u2013 What we do as data scientists\n", "\n", "1. Focus\n", " - Machine learning\n", " - Automation\n", " - Scale\n", "2. Process\n", " - Prototype examples\n", " - Implement in production if possible" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 4, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Background \u2013 The Media Optimization Problem\n", "\n", "We have access to many millions of domains which can each be broken into smaller segments or even URLs. For each ad campaign, how do we pick which parts of what domains to serve to ads to? And how much do we value each of those parts?" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_number": 6, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "source": [ "### Why?\n", "1. Find the best sites to show ads\n", "2. Always serve the best ads possible\n", "3. Learn performance optimally\n", "4. To beat competition" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 6, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Multi-armed bandits (MAB)\n", "\n", "A MAB is an online algorithm which chooses from a set of strategies in a sequence of trials so as to maximize the total payoff of the chosen strategies." ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 8, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "source": [ "Rephrased, it is the problem a gambler faces at a row of slot machines, sometimes known as \"one-armed bandits\", when deciding which machines to play, how many times to play each machine and in which order to play them. When played, each machine provides a random reward from a distribution specific to that machine. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls.$^1$\n", "\n", "$^1$ Wikipedia" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 8, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "source": [ "## Applications\n", "\n", "1. Computational advertising\n", "2. Website configuration\n", "3. Finance\n", "4. A/B testing\n", "5. Evolutionary theory\n", "\n", "In general, any system that emphasizes continuous improvement where products remain in a perpetual state of testing/exploration." ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 8, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Strategies\n", "\n", "1. Greedy - Always play the arm you think is best\n", "2. Epsilon-greedy - Play the best arm for a proportion (1 - epsilon) of the trials, otherwise it explores other arms\n", "3. Epsilon-first - Use the first epsilon porportion of plays to explore the arms and thereafter exploit\n", "4. Bayesian probabilistic matching - Play an arm based on the probability that it is the best arm" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import matplotlib.mlab as mlab\n", "import numpy.random as rand\n", "import numpy as np\n", "%matplotlib inline\n", "from scipy.stats import beta\n", "import random as random\n", "import scipy.integrate as integrate\n", "import math as math\n", "\n", "import time\n", "from bokeh.plotting import *\n", "from bokeh.objects import Glyph\n", "from bokeh.objects import Range1d \n", "output_notebook(url=\"default\")" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 8, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 42 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 8, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Modeling Random Events" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 13 }, "slideshow": { "slide_type": "fragment" } }, "source": [ "Pulling a bandit's arm once is Bernoulli with chance of success $p$\n", "$$\\\\$$\n", "\n", "$$\n", "f(x;p) = p^x(1-p)^{1-x}\n", "\\hskip2em \\text{ for } x \\in \\{0,1\\}\n", "\\\\\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 14 }, "slideshow": { "slide_type": "fragment" } }, "source": [ "Sum of $n$ identical Bernoulli distributed random variables is binomial\n", "$$\\\\$$\n", "\n", "$$f(x;n,p) = \\genfrac(){0pt}{0}{n}{x} p^x(1-p)^{n-x}\n", "\\hskip2em \\text{ for } x \\in \\{0,1,2,...,n\\}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 15, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "source": [ "Beta distribution is the conjugate prior of the binomial distribution\n", "$$\\\\$$\n", "\n", "$$\n", "f(x;\\alpha,\\beta) = \\frac{\\Gamma(\\alpha+\\beta)}{\\Gamma(\\alpha)\\Gamma(\\beta)}x^{\\alpha-1}(1-x)^{\\beta-1}\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 15, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## More on the Beta Distribution\n", "\n", "The mode of the beta distribution is given by\n", "$$\\\\$$\n", "$$\\frac{\\alpha-1}{\\alpha+\\beta-2}$$\n", "$$\\\\$$\n", "\n", "where $\\alpha - 1$ is the count of successful trials and $\\beta - 1$ the count of failed trials\n", "\n", "$$\\\\$$\n", "\n", "You can also update the Beta distribution with information from new trials with $\\alpha_{new}$ and $\\beta_{new}$ to get\n", "\n", "$$\\\\$$\n", "$$\n", "\\frac{\\Gamma(\\alpha'+\\beta')}{\\Gamma(\\alpha')\\Gamma(\\beta')}x^{\\alpha'-1}(1-x)^{\\beta'-1}\n", "$$\n", "$$\\\\$$\n", "\n", "where $$\\alpha' = \\alpha+\\alpha_{new}$$\n", "and\n", "$$\\beta' = \\beta+\\beta_{new}$$" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def betaPlot(prob = 0.5, N = 100):\n", " \n", " figure(tools=\"\", \n", " title = 'Beta Distribution',\n", " x_axis_label = 'Success Probability',\n", " y_axis_label = 'PDF',\n", " x_range=Range1d(start=0, end=1))\n", "\n", " hold()\n", " \n", " a = 1\n", " b = 1\n", " x = np.linspace(0, 1, 100)\n", " \n", " line(x, beta.pdf(x,a,b), color=\"#2E2EFE\", plot_width=800, plot_height=500, legend='Beta Posterior')\n", " line([prob,prob],[0,10], color=\"#FE2E2E\", line_dash = 'dashed', legend='True Mean')\n", "\n", " grid().grid_line_alpha=0.3\n", "\n", " renderer1 = [r for r in curplot().renderers if isinstance(r, Glyph)][0]\n", " ds1 = renderer1.data_source\n", " \n", " show()\n", "\n", " for i in range(N):\n", " test = rand.rand() < 0.5\n", " if i < 6:\n", " print 'a = %s and b = %s' %(a,b)\n", " raw_input(\"\")\n", " print 'Payoff: %s' %test.real\n", " a += test\n", " b += not(test)\n", " ds1.data[\"x\"] = x\n", " ds1.data[\"y\"] = beta.pdf(x,a,b)\n", " ds1._dirty = True\n", " cursession().store_objects(ds1)\n", " time.sleep(0.1)\n", " " ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 15, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "betaPlot()" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 15, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "outputs": [] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 15, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Create a Mulit-Armed Bandit Object" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class MAB(object):\n", " \n", " def __init__(self,numArms):\n", " self.armPayoffs = rand.uniform(0,1,numArms)\n", " self.successes = np.ones(numArms)\n", " self.failures = np.ones(numArms)\n", " \n", " def pullArm(self,arm):\n", " test = rand.uniform()\n", " result = test < self.armPayoffs[arm]\n", " bestResult = test < max(self.armPayoffs)\n", " regret = bestResult - result\n", " self.successes[arm] += result\n", " self.failures[arm] += not(result)\n", " return result, regret\n", " " ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 20, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 20, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## So How Do You Measure the Performance Over Time?" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 22, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "source": [ "The standard is to use $regret$\n", "\n", "$$R_T = \\sum^T_{i=1}(m^*-m_{B(i)})\n", "$$\n", "\n", "where $m^*$ is the payout from the optimal arm and $m_{B(i)}$ is the payout from arm $i$\n" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 22, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Create a Basline Strategy" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def strategyStratifiedSample(mab):\n", " \n", " samples = mab.successes + mab.failures\n", " bestEstArm = samples == min(samples)\n", " return random.sample(np.where(bestEstArm)[0], 1)[0]" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 24 }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "def mab3Plot(strategy, pulls = 100):\n", " \n", " figure(tools=\"\", \n", " title = 'Multi-Armed Bandit',\n", " x_axis_label = 'Success Probability',\n", " y_axis_label = 'PDF',\n", " x_range=Range1d(start=0, end=1))\n", "\n", " hold()\n", " \n", " mab = MAB(3)\n", " x = np.linspace(0, 1, 100)\n", " \n", " line(x, beta.pdf(x,mab.successes[0],mab.failures[0]), color=\"#2E2EFE\", plot_width=800, plot_height=500, legend='Bandit 1 Posterior')\n", " line([mab.armPayoffs[0],mab.armPayoffs[0]],[0,10], color=\"#2E2EFE\", line_dash = 'dashed', legend='Bandit 1 Truth')\n", "\n", " line(x, beta.pdf(x,mab.successes[0],mab.failures[0]), color=\"#AC58FA\", plot_width=800, plot_height=500, legend='Bandit 2 Posterior')\n", " line([mab.armPayoffs[1],mab.armPayoffs[1]],[0,10], color=\"#AC58FA\", line_dash = 'dashed', legend='Bandit 2 Truth')\n", " \n", " line(x, beta.pdf(x,mab.successes[0],mab.failures[0]), color=\"#FE642E\", plot_width=800, plot_height=500, legend='Bandit 3 Posterior')\n", " line([mab.armPayoffs[2],mab.armPayoffs[2]],[0,10], color=\"#FE642E\", line_dash = 'dashed', legend='Bandit 3 Truth')\n", " \n", " grid().grid_line_alpha=0.3\n", "\n", " renderer0 = [r for r in curplot().renderers if isinstance(r, Glyph)][0]\n", " ds0 = renderer0.data_source\n", " \n", " renderer2 = [r for r in curplot().renderers if isinstance(r, Glyph)][2]\n", " ds2 = renderer2.data_source\n", " \n", " renderer4 = [r for r in curplot().renderers if isinstance(r, Glyph)][4]\n", " ds4 = renderer4.data_source\n", " \n", " show()\n", "\n", " for i in range(pulls):\n", " arm = strategy(mab)\n", " result, regret = mab.pullArm(arm)\n", " ds0.data[\"x\"] = x\n", " ds0.data[\"y\"] = beta.pdf(x,mab.successes[0],mab.failures[0])\n", " ds0._dirty = True\n", " \n", " ds2.data[\"x\"] = x\n", " ds2.data[\"y\"] = beta.pdf(x,mab.successes[1],mab.failures[1])\n", " ds2._dirty = True\n", " \n", " ds4.data[\"x\"] = x\n", " ds4.data[\"y\"] = beta.pdf(x,mab.successes[2],mab.failures[2])\n", " ds4._dirty = True\n", " \n", " cursession().store_objects(ds0)\n", " cursession().store_objects(ds2)\n", " cursession().store_objects(ds4)\n", " time.sleep(0.01)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 24, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "mab3Plot(strategy=strategyStratifiedSample)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 24, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 44 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 24, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Simulate a Basic Experiment" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def stratifiedRegret(cdata):\n", " figure(tools=\"\", \n", " title = 'Multi-Armed Bandit Regret',\n", " x_axis_label = 'Iteration',\n", " y_axis_label = 'Regret',\n", " x_range=Range1d(start=0, end=pulls),\n", " y_range=Range1d(start=0, end=max(cdata)))\n", "\n", " hold()\n", "\n", " line(range(pulls), cdata, color=\"#FF4000\", plot_width=800, plot_height=500, legend='Stratified Sample')\n", "\n", " grid().grid_line_alpha=0.3\n", " legend()[0].orientation = \"top_left\"\n", " show()\n" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 24 }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "mab = MAB(3)\n", "pulls = 1000\n", "data = np.zeros(pulls)\n", "\n", "for i in range(pulls):\n", " arm = strategyStratifiedSample(mab)\n", " result, regret = mab.pullArm(arm)\n", " data[i] = regret\n", " \n", "cdata = np.cumsum(data)\n", " \n", "print mab.armPayoffs\n", "\n", "stratifiedRegret(cdata)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 29, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[ 0.26952967 0.25392393 0.53431951]\n" ] }, { "html": [ "" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 29, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Greedy Strategy" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def strategyGreedy(mab):\n", " rates = mab.successes/(mab.successes + mab.failures)\n", " bestEstArm = rates == max(rates)\n", " return random.sample(np.where(bestEstArm)[0], 1)[0]" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 31, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "mab3Plot(strategy=strategyGreedy)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 31, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 45 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 31, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Epsilon Greedy Strategy" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def strategyEpsilonGreedy(mab, epsilon = 0.1):\n", " if random.random() < epsilon:\n", " return random.randint(0,len(mab.successes)-1)\n", " else:\n", " return strategyGreedy(mab)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 34, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 34, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Epsilon First Strategy" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def strategyEpsilonFirst(mab, pulls = 100, epsilon = 0.1):\n", " trials = pulls*epsilon\n", " samples = mab.successes + mab.failures\n", " if sum(samples) < trials:\n", " return strategyStratifiedSample(mab)\n", " else:\n", " return strategyGreedy(mab)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 36 }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "mab3Plot(strategy=strategyEpsilonFirst)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 36 }, "slideshow": { "slide_type": "skip" } }, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "def basicRegretPlot(data):\n", " \n", " randBandit = np.cumsum(data.StratifiedSample.astype(float))\n", " greedyBandit = np.cumsum(data.Greedy.astype(float))\n", " epsGreedyBandit = np.cumsum(data.EpsilonGreedy.astype(float))\n", " epsFirstBandit = np.cumsum(data.EpsilonFirst.astype(float))\n", " \n", " figure(tools=\"\", \n", " title = 'Multi-Armed Bandit Regret',\n", " x_axis_label = 'Iteration',\n", " y_axis_label = 'Regret',\n", " x_range=Range1d(start=0, end=len(data)),\n", " y_range=Range1d(start=0, end=max(randBandit)))\n", "\n", " hold() \n", " \n", " line(range(len(randBandit)), randBandit, color=\"#FF4000\", plot_width=800, plot_height=500, legend='Stratified Sample')\n", " line(range(len(randBandit)), greedyBandit, color=\"#AC58FA\", plot_width=800, plot_height=500, legend='Greedy')\n", " line(range(len(randBandit)), epsGreedyBandit, color=\"#0080FF\", plot_width=800, plot_height=500, legend='Epsilon Greedy')\n", " line(range(len(randBandit)), epsFirstBandit, color=\"#01DF01\", plot_width=800, plot_height=500, legend='Epsilon First')\n", "\n", " grid().grid_line_alpha=0.3\n", " legend()[0].orientation = \"top_left\"\n", " show()\n" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 36, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 36, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Tweak the MAB Class" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class MABs(object):\n", " \n", " def __init__(self,armPayoffs):\n", " self.armPayoffs = armPayoffs\n", " self.successes = np.ones(len(armPayoffs))\n", " self.failures = np.ones(len(armPayoffs))\n", " \n", " def pullArm(self,arm,test):\n", " result = test < self.armPayoffs[arm]\n", " bestResult = test < max(self.armPayoffs)\n", " regret = bestResult - result\n", " self.successes[arm] += result\n", " self.failures[arm] += not(result)\n", " return result, regret" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 40, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 40, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Compare Strategies" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def genBasicData(numArms = 3, pulls = 100):\n", "\n", " armPayoffs = rand.uniform(0,1,numArms)\n", "\n", " mab0 = MABs(armPayoffs)\n", " mab1 = MABs(armPayoffs)\n", " mab2 = MABs(armPayoffs)\n", " mab3 = MABs(armPayoffs)\n", "\n", " cols = ['StratifiedSample','Greedy','EpsilonGreedy','EpsilonFirst']\n", " data = pd.DataFrame(columns=cols)\n", "\n", " for i in range(pulls):\n", " test = rand.uniform()\n", " arm0 = strategyStratifiedSample(mab0)\n", " result0, regret0 = mab0.pullArm(arm0, test)\n", " arm1 = strategyGreedy(mab1)\n", " result1, regret1 = mab1.pullArm(arm1, test)\n", " arm2 = strategyEpsilonGreedy(mab2, 0.1)\n", " result2, regret2 = mab2.pullArm(arm2, test)\n", " arm3 = strategyEpsilonFirst(mab3, pulls, 0.1)\n", " result3, regret3 = mab3.pullArm(arm3, test)\n", "\n", " row = pd.DataFrame([dict(StratifiedSample = regret0,\n", " Greedy = regret1,\n", " EpsilonGreedy = regret2, \n", " EpsilonFirst = regret3)])\n", " data = data.append(row, ignore_index=True)\n", "\n", " #print mab0.armPayoffs\n", " return data" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 42, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "data = genBasicData()\n", "basicRegretPlot(data)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 42, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 43 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 42, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Average Performance" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def basicSimulation(sims = 100):\n", " data = genBasicData()\n", " for i in range(1,sims):\n", " data += genBasicData()\n", " return data/100.0\n", " " ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 45 }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "data = basicSimulation()" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 46, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "basicRegretPlot(data)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 46, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 46, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Bayesian Probabilistic Matching" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 49, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "source": [ "The binomial bandit assumes that the rewards in each configuration are independent Bernoulli random variables with success probabilities ($\\theta_1,...,\\theta_k$)$^1$.\n", "\n", "Assume a simple uninformative prior of $\\theta_a \\backsim U(0,1)$ with all $\\theta_i$ being independent.\n", "\n", "Let $Y_{at}$ and $N_{at}$ denote the cumulative number of successes and trials observed for arm $a$ up to time $t$. Then the posterior distribution of $\\theta = (\\theta_1,...,\\theta_k) $ is\n", "\n", "$$\n", "p(\\theta|\\boldsymbol y_t) = \\prod^k_{a=1}B(\\theta_a|Y_{at}+1,N_{at}-Y_{at}+1)\n", "\\\\\n", "$$\n", "\n", "Then the probability of arm $a$ being the optimal arm with all the information at time $t$ is\n", "\n", "$$\n", "w_{at} = \\int^1_0 B(\\theta_a|Y_{at}+1,N_{at}-Y_{at}+1)\n", "\\prod^k_{j \\ne a} P(\\theta_j<\\theta_a|Y_{jt}+1,N_{jt}-Y_{jt}+1)\n", "$$\n", "\n", "$$\\\\$$\n", "\n", "$^1$ 'A modern Bayesian look at the multi-armed bandit' - Steven Scott" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 49, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Bayesian Probabilistic Matching in Code" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def bpmProbs(mab):\n", " y = mab.successes-1\n", " n = mab.successes + mab.failures - 2\n", "\n", " k = len(y)\n", " ans = np.zeros(k)\n", "\n", " for i in range(k):\n", " indx = range(k)\n", " del indx[i]\n", "\n", " def f(x):\n", " r = beta.pdf(x, y[i]+1, n[i]-y[i]+1)\n", " for j in indx:\n", " r = r * beta.cdf(x, y[j]+1, n[j]-y[j]+1)\n", " return r\n", "\n", " ans[i] = integrate.quad(f,0,1)[0]\n", " \n", " return ans\n", "\n" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 51, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 51, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Bayesian Probabilistic Matching in Code continued" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def weightedSample(weights):\n", " '''Return the index to sample based on weight vector'''\n", " r = random.random()\n", " cweight = np.cumsum(weights)\n", " choice = min(cweight[np.where(cweight >= r)])\n", " return np.where(cweight == choice)[0][0]" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 53 }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "pd.Series([weightedSample([0.333,0.333,0.334]) for _ in range(10000)]).hist(figsize=(12,7))\n", "#pd.Series([weightedSample([0.8,0.1,0.1]) for _ in range(10000)]).hist(figsize=(12,7))" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 54, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 54, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Bayesian Probabilistic Matching in Code continued" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def StrategyBPM(mab):\n", " weights = bpmProbs(mab)\n", " return weightedSample(weights)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 56, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "mab3Plot(strategy=StrategyBPM)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 56, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 46 }, { "cell_type": "code", "collapsed": false, "input": [ "def genBasicData2(numArms = 3, pulls = 100):\n", "\n", " armPayoffs = rand.uniform(0,1,numArms)\n", "\n", " mab0 = MABs(armPayoffs)\n", " mab1 = MABs(armPayoffs)\n", " mab2 = MABs(armPayoffs)\n", " mab3 = MABs(armPayoffs)\n", " mab4 = MABs(armPayoffs)\n", "\n", " cols = ['StratifiedSample','Greedy','EpsilonGreedy','EpsilonFirst','BPM']\n", " data = pd.DataFrame(columns=cols)\n", "\n", " for i in range(pulls):\n", " test = rand.uniform()\n", " arm0 = strategyStratifiedSample(mab0)\n", " result0, regret0 = mab0.pullArm(arm0, test)\n", " arm1 = strategyGreedy(mab1)\n", " result1, regret1 = mab1.pullArm(arm1, test)\n", " arm2 = strategyEpsilonGreedy(mab2, 0.1)\n", " result2, regret2 = mab2.pullArm(arm2, test)\n", " arm3 = strategyEpsilonFirst(mab3, pulls, 0.1)\n", " result3, regret3 = mab3.pullArm(arm3, test)\n", " arm4 = StrategyBPM(mab4)\n", " result4, regret4 = mab4.pullArm(arm4, test)\n", "\n", " row = pd.DataFrame([dict(StratifiedSample = regret0,\n", " Greedy = regret1,\n", " EpsilonGreedy = regret2, \n", " EpsilonFirst = regret3,\n", " BPM = regret4)])\n", " data = data.append(row, ignore_index=True)\n", "\n", " #print mab0.armPayoffs\n", " return data" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 56 }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "def basicSimulation2(sims = 100):\n", " data = genBasicData2()\n", " dataseries = []\n", " dataseries.append(data.sum())\n", " for i in range(1,sims):\n", " d = genBasicData2()\n", " data += d\n", " dataseries.append(d.sum())\n", " return data/100.0, dataseries\n", " " ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 56 }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "def basicRegretPlot2(data):\n", " \n", " randBandit = np.cumsum(data.StratifiedSample.astype(float))\n", " greedyBandit = np.cumsum(data.Greedy.astype(float))\n", " epsGreedyBandit = np.cumsum(data.EpsilonGreedy.astype(float))\n", " epsFirstBandit = np.cumsum(data.EpsilonFirst.astype(float))\n", " BPMBandit = np.cumsum(data.BPM.astype(float))\n", " \n", " figure(tools=\"\", \n", " title = 'Multi-Armed Bandit Regret',\n", " x_axis_label = 'Iteration',\n", " y_axis_label = 'Regret',\n", " x_range=Range1d(start=0, end=len(data)),\n", " y_range=Range1d(start=0, end=max(randBandit)))\n", "\n", " hold() \n", " \n", " line(range(len(randBandit)), randBandit, color=\"#FF4000\", plot_width=800, plot_height=500, legend='Stratified Sample')\n", " line(range(len(randBandit)), greedyBandit, color=\"#AC58FA\", plot_width=800, plot_height=500, legend='Greedy')\n", " line(range(len(randBandit)), epsGreedyBandit, color=\"#0080FF\", plot_width=800, plot_height=500, legend='Epsilon Greedy')\n", " line(range(len(randBandit)), epsFirstBandit, color=\"#01DF01\", plot_width=800, plot_height=500, legend='Epsilon First')\n", " line(range(len(randBandit)), BPMBandit, color=\"#FF0000\", plot_width=800, plot_height=500, legend='BPM')\n", "\n", " grid().grid_line_alpha=0.3\n", " legend()[0].orientation = \"top_left\"\n", " show()\n" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 56 }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 26 }, { "cell_type": "code", "collapsed": false, "input": [ "data, dataseries = basicSimulation2()" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 56, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 27 }, { "cell_type": "code", "collapsed": false, "input": [ "basicRegretPlot2(data)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 56, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 28 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 56, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## A Closer Look" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pd.concat(dataseries,axis=1).mean(axis=1)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 64 }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 29, "text": [ "BPM 4.83\n", "EpsilonFirst 4.63\n", "EpsilonGreedy 5.25\n", "Greedy 5.15\n", "StratifiedSample 25.72\n", "dtype: float64" ] } ], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "pd.concat(dataseries,axis=1).var(axis=1)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 65, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 30, "text": [ "BPM 8.869798\n", "EpsilonFirst 62.437475\n", "EpsilonGreedy 26.593434\n", "Greedy 91.684343\n", "StratifiedSample 194.789495\n", "dtype: float64" ] } ], "prompt_number": 30 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 65, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## What Happens When The Payoffs Change?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def genBasicData3(numArms = 3, pulls = 100):\n", "\n", " armPayoffs = rand.uniform(0,1,numArms)\n", "\n", " mab0 = MABs(armPayoffs)\n", " mab1 = MABs(armPayoffs)\n", " mab2 = MABs(armPayoffs)\n", " mab3 = MABs(armPayoffs)\n", " mab4 = MABs(armPayoffs)\n", "\n", " cols = ['StratifiedSample','Greedy','EpsilonGreedy','EpsilonFirst','BPM']\n", " data = pd.DataFrame(columns=cols)\n", "\n", " for i in range(pulls):\n", " \n", " if i == pulls/2:\n", " mab0.armPayoffs = mab1.armPayoffs = mab2.armPayoffs = \\\n", " mab3.armPayoffs = mab4.armPayoffs = rand.uniform(0,1,numArms)\n", " \n", " test = rand.uniform()\n", " arm0 = strategyStratifiedSample(mab0)\n", " result0, regret0 = mab0.pullArm(arm0, test)\n", " arm1 = strategyGreedy(mab1)\n", " result1, regret1 = mab1.pullArm(arm1, test)\n", " arm2 = strategyEpsilonGreedy(mab2, 0.1)\n", " result2, regret2 = mab2.pullArm(arm2, test)\n", " arm3 = strategyEpsilonFirst(mab3, pulls, 0.1)\n", " result3, regret3 = mab3.pullArm(arm3, test)\n", " arm4 = StrategyBPM(mab4)\n", " result4, regret4 = mab4.pullArm(arm4, test)\n", "\n", " row = pd.DataFrame([dict(StratifiedSample = regret0,\n", " Greedy = regret1,\n", " EpsilonGreedy = regret2, \n", " EpsilonFirst = regret3,\n", " BPM = regret4)])\n", " data = data.append(row, ignore_index=True)\n", "\n", " #print mab0.armPayoffs\n", " return data\n", "\n", "def basicSimulation3(sims = 100):\n", " data = genBasicData3()\n", " for i in range(1,sims):\n", " data += genBasicData3()\n", " return data/100.0" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 65 }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 31 }, { "cell_type": "code", "collapsed": false, "input": [ "data = basicSimulation3()" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 68 }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 32 }, { "cell_type": "code", "collapsed": false, "input": [ "basicRegretPlot2(data)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 69 }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "def mab3PlotVarying(strategy, pulls = 100):\n", " \n", " figure(tools=\"\", \n", " title = 'Multi-Armed Bandit',\n", " x_axis_label = 'Success Probability',\n", " y_axis_label = 'PDF',\n", " x_range=Range1d(start=0, end=1))\n", "\n", " hold()\n", " \n", " armPayoffs = np.array([0, 0.5, 0])\n", " x1 = np.linspace(0, 1.5, 100)\n", " payoff0 = map(math.sin, x1)\n", " payoff2 = map(math.cos, x1)\n", " armPayoffs[0] = payoff0[0]\n", " armPayoffs[2] = payoff2[0]\n", " mab = MABs(armPayoffs)\n", " \n", " x = np.linspace(0, 1, 100)\n", " \n", " line(x, beta.pdf(x,mab.successes[0],mab.failures[0]), color=\"#2E2EFE\", plot_width=800, plot_height=500, legend='Bandit 1 Posterior')\n", " line([mab.armPayoffs[0],mab.armPayoffs[0]],[0,10], color=\"#2E2EFE\", line_dash = 'dashed', legend='Bandit 1 Truth')\n", "\n", " line(x, beta.pdf(x,mab.successes[0],mab.failures[0]), color=\"#AC58FA\", plot_width=800, plot_height=500, legend='Bandit 2 Posterior')\n", " line([mab.armPayoffs[1],mab.armPayoffs[1]],[0,10], color=\"#AC58FA\", line_dash = 'dashed', legend='Bandit 2 Truth')\n", " \n", " line(x, beta.pdf(x,mab.successes[0],mab.failures[0]), color=\"#FE642E\", plot_width=800, plot_height=500, legend='Bandit 3 Posterior')\n", " line([mab.armPayoffs[2],mab.armPayoffs[2]],[0,10], color=\"#FE642E\", line_dash = 'dashed', legend='Bandit 3 Truth')\n", " \n", " grid().grid_line_alpha=0.3\n", "\n", " renderer0 = [r for r in curplot().renderers if isinstance(r, Glyph)][0]\n", " ds0 = renderer0.data_source\n", " \n", " renderer1 = [r for r in curplot().renderers if isinstance(r, Glyph)][1]\n", " ds1 = renderer1.data_source\n", " \n", " renderer2 = [r for r in curplot().renderers if isinstance(r, Glyph)][2]\n", " ds2 = renderer2.data_source\n", " \n", " renderer3 = [r for r in curplot().renderers if isinstance(r, Glyph)][3]\n", " ds3 = renderer3.data_source\n", " \n", " renderer4 = [r for r in curplot().renderers if isinstance(r, Glyph)][4]\n", " ds4 = renderer4.data_source\n", " \n", " renderer5 = [r for r in curplot().renderers if isinstance(r, Glyph)][5]\n", " ds5 = renderer5.data_source\n", " \n", " show()\n", "\n", " for i in range(pulls):\n", " \n", " test = rand.uniform()\n", " \n", " armPayoffs[0] = payoff0[i]\n", " armPayoffs[2] = payoff2[i]\n", " \n", " mab.armPayoffs = armPayoffs\n", " \n", " arm = strategy(mab)\n", " result, regret = mab.pullArm(arm, test)\n", " ds0.data[\"x\"] = x\n", " ds0.data[\"y\"] = beta.pdf(x,mab.successes[0],mab.failures[0])\n", " ds0._dirty = True\n", " \n", " ds1.data[\"x\"] = [payoff0[i],payoff0[i]]\n", " ds1.data[\"y\"] = [0,10]\n", " ds1._dirty = True\n", " \n", " ds2.data[\"x\"] = x\n", " ds2.data[\"y\"] = beta.pdf(x,mab.successes[1],mab.failures[1])\n", " ds2._dirty = True\n", " \n", " ds3.data[\"x\"] = [0.5,0.5]\n", " ds3.data[\"y\"] = [0,10]\n", " ds3._dirty = True\n", " \n", " ds4.data[\"x\"] = x\n", " ds4.data[\"y\"] = beta.pdf(x,mab.successes[2],mab.failures[2])\n", " ds4._dirty = True\n", " \n", " ds5.data[\"x\"] = [payoff2[i],payoff2[i]]\n", " ds5.data[\"y\"] = [0,10]\n", " ds5._dirty = True\n", " \n", " cursession().store_objects(ds0)\n", " cursession().store_objects(ds1)\n", " cursession().store_objects(ds2)\n", " cursession().store_objects(ds3)\n", " cursession().store_objects(ds4)\n", " cursession().store_objects(ds5)\n", " time.sleep(0.01)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 69, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 34 }, { "cell_type": "code", "collapsed": false, "input": [ "mab3PlotVarying(strategy=StrategyBPM)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 69, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "outputs": [] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 69, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Let's Decay Bandit Memory" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class MABdecay(object):\n", " \n", " def __init__(self,armPayoffs,decay):\n", " self.armPayoffs = armPayoffs\n", " self.successes = np.ones(len(armPayoffs))\n", " self.failures = np.ones(len(armPayoffs))\n", " self.decay = decay\n", " \n", " def pullArm(self,arm,test):\n", " result = test < self.armPayoffs[arm]\n", " bestResult = test < max(self.armPayoffs)\n", " regret = bestResult - result\n", " self.successes[arm] += result\n", " self.failures[arm] += not(result)\n", " \n", " #decay\n", " self.successes[arm] = max(1, (1-self.decay)*self.successes[arm])\n", " self.failures[arm] = max(1, (1-self.decay)*self.failures[arm])\n", " \n", " return result, regret" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 73 }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "prompt_number": 35 }, { "cell_type": "code", "collapsed": false, "input": [ "def mab3PlotVarying2(strategy, pulls = 100, decay = 0.1):\n", " \n", " figure(tools=\"\", \n", " title = 'Multi-Armed Bandit',\n", " x_axis_label = 'Success Probability',\n", " y_axis_label = 'PDF',\n", " x_range=Range1d(start=0, end=1))\n", "\n", " hold()\n", " \n", " armPayoffs = np.array([0, 0.5, 0])\n", " x1 = np.linspace(0, 1.5, 100)\n", " payoff0 = map(math.sin, x1)\n", " payoff2 = map(math.cos, x1)\n", " armPayoffs[0] = payoff0[0]\n", " armPayoffs[2] = payoff2[0]\n", " mab = MABdecay(armPayoffs, decay)\n", " \n", " x = np.linspace(0, 1, 100)\n", " \n", " line(x, beta.pdf(x,mab.successes[0],mab.failures[0]), color=\"#2E2EFE\", plot_width=800, plot_height=500, legend='Bandit 1 Posterior')\n", " line([mab.armPayoffs[0],mab.armPayoffs[0]],[0,10], color=\"#2E2EFE\", line_dash = 'dashed', legend='Bandit 1 Truth')\n", "\n", " line(x, beta.pdf(x,mab.successes[0],mab.failures[0]), color=\"#AC58FA\", plot_width=800, plot_height=500, legend='Bandit 2 Posterior')\n", " line([mab.armPayoffs[1],mab.armPayoffs[1]],[0,10], color=\"#AC58FA\", line_dash = 'dashed', legend='Bandit 2 Truth')\n", " \n", " line(x, beta.pdf(x,mab.successes[0],mab.failures[0]), color=\"#FE642E\", plot_width=800, plot_height=500, legend='Bandit 3 Posterior')\n", " line([mab.armPayoffs[2],mab.armPayoffs[2]],[0,10], color=\"#FE642E\", line_dash = 'dashed', legend='Bandit 3 Truth')\n", " \n", " grid().grid_line_alpha=0.3\n", "\n", " renderer0 = [r for r in curplot().renderers if isinstance(r, Glyph)][0]\n", " ds0 = renderer0.data_source\n", " \n", " renderer1 = [r for r in curplot().renderers if isinstance(r, Glyph)][1]\n", " ds1 = renderer1.data_source\n", " \n", " renderer2 = [r for r in curplot().renderers if isinstance(r, Glyph)][2]\n", " ds2 = renderer2.data_source\n", " \n", " renderer3 = [r for r in curplot().renderers if isinstance(r, Glyph)][3]\n", " ds3 = renderer3.data_source\n", " \n", " renderer4 = [r for r in curplot().renderers if isinstance(r, Glyph)][4]\n", " ds4 = renderer4.data_source\n", " \n", " renderer5 = [r for r in curplot().renderers if isinstance(r, Glyph)][5]\n", " ds5 = renderer5.data_source\n", " \n", " show()\n", "\n", " for i in range(pulls):\n", " \n", " test = rand.uniform()\n", " \n", " armPayoffs[0] = payoff0[i]\n", " armPayoffs[2] = payoff2[i]\n", " \n", " mab.armPayoffs = armPayoffs\n", " \n", " arm = strategy(mab)\n", " result, regret = mab.pullArm(arm, test)\n", " ds0.data[\"x\"] = x\n", " ds0.data[\"y\"] = beta.pdf(x,mab.successes[0],mab.failures[0])\n", " ds0._dirty = True\n", " \n", " ds1.data[\"x\"] = [payoff0[i],payoff0[i]]\n", " ds1.data[\"y\"] = [0,10]\n", " ds1._dirty = True\n", " \n", " ds2.data[\"x\"] = x\n", " ds2.data[\"y\"] = beta.pdf(x,mab.successes[1],mab.failures[1])\n", " ds2._dirty = True\n", " \n", " ds3.data[\"x\"] = [0.5,0.5]\n", " ds3.data[\"y\"] = [0,10]\n", " ds3._dirty = True\n", " \n", " ds4.data[\"x\"] = x\n", " ds4.data[\"y\"] = beta.pdf(x,mab.successes[2],mab.failures[2])\n", " ds4._dirty = True\n", " \n", " ds5.data[\"x\"] = [payoff2[i],payoff2[i]]\n", " ds5.data[\"y\"] = [0,10]\n", " ds5._dirty = True\n", " \n", " cursession().store_objects(ds0)\n", " cursession().store_objects(ds1)\n", " cursession().store_objects(ds2)\n", " cursession().store_objects(ds3)\n", " cursession().store_objects(ds4)\n", " cursession().store_objects(ds5)\n", " time.sleep(0.01)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 73, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": [ "mab3PlotVarying2(strategy=StrategyBPM)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 73, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 47 }, { "cell_type": "code", "collapsed": false, "input": [ "def genBasicData4(numArms = 3, pulls = 100, decay = 0.1):\n", "\n", " armPayoffs = [0, 0.5, 0]\n", " x = np.linspace(0, 2, 100)\n", " payoff0 = map(math.sin, x)\n", " payoff2 = map(math.cos, x)\n", " armPayoffs[0] = payoff0[0]\n", " armPayoffs[2] = payoff2[0]\n", " \n", " mab0 = MABdecay(armPayoffs, decay = 0.1)\n", " mab1 = MABdecay(armPayoffs, decay = 0.1)\n", " mab2 = MABdecay(armPayoffs, decay = 0.1)\n", " mab3 = MABdecay(armPayoffs, decay = 0.1)\n", " mab4 = MABdecay(armPayoffs, decay = 0.1)\n", "\n", " cols = ['StratifiedSample','Greedy','EpsilonGreedy','EpsilonFirst','BPM']\n", " data = pd.DataFrame(columns=cols)\n", "\n", " for i in range(pulls):\n", " \n", " armPayoffs[0] = payoff0[i]\n", " armPayoffs[2] = payoff2[i]\n", " \n", " mab0.armPayoffs = mab1.armPayoffs = mab2.armPayoffs = \\\n", " mab3.armPayoffs = mab4.armPayoffs = armPayoffs\n", " \n", " test = rand.uniform()\n", " arm0 = strategyStratifiedSample(mab0)\n", " result0, regret0 = mab0.pullArm(arm0, test)\n", " arm1 = strategyGreedy(mab1)\n", " result1, regret1 = mab1.pullArm(arm1, test)\n", " arm2 = strategyEpsilonGreedy(mab2, 0.1)\n", " result2, regret2 = mab2.pullArm(arm2, test)\n", " arm3 = strategyEpsilonFirst(mab3, pulls, 0.1)\n", " result3, regret3 = mab3.pullArm(arm3, test)\n", " arm4 = StrategyBPM(mab4)\n", " result4, regret4 = mab4.pullArm(arm4, test)\n", "\n", " row = pd.DataFrame([dict(StratifiedSample = regret0,\n", " Greedy = regret1,\n", " EpsilonGreedy = regret2, \n", " EpsilonFirst = regret3,\n", " BPM = regret4)])\n", " data = data.append(row, ignore_index=True)\n", "\n", " return data\n", "\n", "def basicSimulation4(sims = 100):\n", " data = genBasicData4()\n", " dataseries = []\n", " dataseries.append(data.sum())\n", " for i in range(1,sims):\n", " d = genBasicData4()\n", " data += d\n", " dataseries.append(d.sum())\n", " return data/100.0, dataseries" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 73, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 37 }, { "cell_type": "code", "collapsed": false, "input": [ "data, dataseries = basicSimulation4()" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 73, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "outputs": [], "prompt_number": 38 }, { "cell_type": "code", "collapsed": false, "input": [ "basicRegretPlot2(data)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 78, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 39 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 78, "slide_type": "subslide" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## A Closer Look Again" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pd.concat(dataseries,axis=1).mean(axis=1)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 80 }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 40, "text": [ "BPM 13.71\n", "EpsilonFirst 24.70\n", "EpsilonGreedy 14.07\n", "Greedy 14.39\n", "StratifiedSample 37.00\n", "dtype: float64" ] } ], "prompt_number": 40 }, { "cell_type": "code", "collapsed": false, "input": [ "pd.concat(dataseries,axis=1).var(axis=1)" ], "language": "python", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 81, "slide_helper": "subslide_end" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 41, "text": [ "BPM 25.420101\n", "EpsilonFirst 38.191919\n", "EpsilonGreedy 65.479899\n", "Greedy 61.715051\n", "StratifiedSample 14.989899\n", "dtype: float64" ] } ], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 81, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "source": [ "## Summary\n", "\n", "1. These methods are simple\n", "2. Surprisingly flexible and extendable\n", " - Time varying\n", " - Batch processing\n", " - Differing payoffs\n", " - Pulling multiple arms simultaneously\n", "3. Set-it-and-forget-it\n", "4. Some upfront effort matching algorithm to your problem\n", "5. Easily testible\n", "6. Epsilon first is A/B testing\n", "7. Simulation is awesome!\n" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 81, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "source": [ "## Our Implementation\n", "\n", "1. Custom BPM\n", "2. Update our posterior with information from multiple models\n", "3. Update sampled beta beliefs using dynamic linear models \n", "4. Batch process information\n", "5. Sparse information from millions of arms for thousands of campaigns" ] }, { "cell_type": "markdown", "metadata": { "internals": { "frag_helper": "fragment_end", "frag_number": 81, "slide_helper": "subslide_end", "slide_type": "subslide" }, "slide_helper": "slide_end", "slideshow": { "slide_type": "slide" } }, "source": [ "# Thank You\n", "![logo](maxpoint.png)\n", "\n", "## #9 on Deloitte 2013 Technology Fast 500\u2122 Rankings\n", "\n", "### michael@maxpoint.com\n", "\n", "# We are hiring!\n" ] } ], "metadata": {} } ] }