{ "cells": [ { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "\n", "*This notebook contains course material from [CBE40455](https://jckantor.github.io/CBE40455) by\n", "Jeffrey Kantor (jeff at nd.edu); the content is available [on Github](https://github.com/jckantor/CBE40455.git).\n", "The text is released under the [CC-BY-NC-ND-4.0 license](https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode),\n", "and code is released under the [MIT license](https://opensource.org/licenses/MIT).*" ] }, { "cell_type": "markdown", "metadata": { "pycharm": {} }, "source": [ "\n", "< [Scenario Analysis for a Plant Expansion](http://nbviewer.jupyter.org/github/jckantor/CBE40455/blob/master/notebooks/06.02-Scenario-Analysis-for-a-Plant-Expansion.ipynb) | [Contents](toc.ipynb) | [Risk Averse Gambler](http://nbviewer.jupyter.org/github/jckantor/CBE40455/blob/master/notebooks/06.04-Risk-Averse-Gambler.ipynb) >
"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"# Risk Neutral Gambler"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Problem Statement"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"THe risk neutral gambler enters a game with the idea of betting until he or she either reaches a goal $N$ or runs out of money. Beginning with a stake $x$ and wager $u$, the resulting stake is either $x+u$ with probability $p$, or $x-u$ with probability $q$ (where $p + q \\leq 1$.) The wager must be smaller than the stake or any maximum wager established for the game. The future value of money may be discounted by a factor $a \\leq 1$. \n",
"\n",
"Given an initial stake $x \\lt N$, what is the optimal gambling strategy?"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Formulation"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"This classic problem in Dynamic Programming is discussed, for example, by Richard Sutton and Andrew Barto in their book [_Reinforcement Learning_ (MIT Press, 1998)](http://mitpress.mit.edu/books/reinforcement-learning). The function $V(k,x)$ is the expected value of the game after the $k^{th}$ wager and with a stake $x$. If the gambler reaches the goal of winning a stake $N$ at $k$ then the value of the game is $V(k,N) = N$. If the gambler loses everything, then $V(k,0) = 0$. Otherwise, for $x \\lt N$, the Bellman equation for optimality provides the recursion\n",
"\n",
"$$V(k-1,x) = a \\max_u \\left[ p V(k,x+u) + q V(k,x-u) \\right]$$\n",
"\n",
"where $a$ is the discount factor for future values. The maximization is over the set of possible bets ranging from $0$ to the minimum of $x$, $N-x$, or the bet limit $B$. Note that the state space and set of control actions are finite."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## Solution by Linear Programming"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"The optimality equation can be solved by well known methods for policy iteration. Alternatively, as shown, for example, by Sheldon Ross in [_Introduction to Stochastic Dynamic Programming_ (Academic Press, 1983)](http://www.amazon.com/Introduction-Stochastic-Dynamic-Programming-Sheldon/dp/0125984219), an exact solution can \n",
"be found by linear programming. We seek a stationary solution $V[x]$ by minimizing $\\sum_{x \\in 0..N} V[x]$ subject to \n",
"\n",
"$$ V[x] \\geq a \\left( p V[x + u] + q V[x-u]\\right) $$\n",
"\n",
"for all feasible bets and all $x$ in $1..N-1$ with boundary conditions $V[0] = 0$ and $V[N] = N$. The set of optimal wagers $u[x]$ are found by determing the constraints that are active at optimality. $u[x]$ may have multiple values."
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {}
},
"source": [
"## MathProg Model"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"pycharm": {}
},
"outputs": [],
"source": [
"%%script glpsol -m /dev/stdin -y output.txt --out output\n",
"\n",
"/* Problem Parameters. Any of these can be adjusted in a data section. */\n",
"\n",
"param N default 100, >= 1; # Goal\n",
"param p default 0.25, >= 0, <= 1; # Winning probability\n",
"param q default 1-p, >= 0, <= 1-p; # Losing probability\n",
"param B default N, >= 1, <= N; # Maximum wager size\n",
"param a default 1, >= 0, <= 1; # Discount factor\n",
"\n",
"/* Set of States */\n",
"\n",
"set X:= 0..N;\n",
"\n",
"/* Sets of possible wagers. These are parameterized by the State */\n",
"\n",
"set U{x in X} := 1..min(B,min(N-x,x));\n",
"\n",
"/* Value function */\n",
"\n",
"var V{X};\n",
"\n",
"/* Exact Linear Program Equivalent of the DP */\n",
"\n",
"minimize OBJ: sum{x in X} V[x] ;\n",
"\n",
"s.t. C1 {x in 1..N-1, u in U[x]}: V[x] >= a*(p*V[x+u] + q*V[x-u]);\n",
"s.t. C2: V[0] = 0;\n",
"s.t. C3: V[N] = N;\n",
"\n",
"solve;\n",
"\n",
"table tab1 {x in X} OUT \"CSV\" \"output.csv\" : \n",
" x~Stake, V[x]~ExpectedValue;\n",
"\n",
"printf \" Goal = %4d\", N;\n",
"printf \"\\n Maximum Bet = %4d\", B;\n",
"printf \"\\nWinning Probability = %8.3f\", p ;\n",
"printf \"\\n Losing Probability = %8.3f\", q ;\n",
"printf \"\\n Discount Factor = %8.3f\", a;\n",
"printf \"\\n\\n %7s %10s %4s\\n\",'x','V[x]','u[x]: Optimal Wagers';\n",
"printf \" %7s %10s %4s\" ,'-','----','---------------------';\n",
"for {x in X}{\n",
" printf \"\\n %7d %10.4f \",x, V[x];\n",
" printf {u in U[x]: abs(-V[x] + a*(p*V[x+u] + q*V[x-u])) < 0.00001} \" %3d\",u;\n",
"}\n",
"\n",
"end;\n"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"pycharm": {}
},
"outputs": [
{
"data": {
"text/plain": [
" "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}