### A Pluto.jl notebook ### # v0.20.4 using Markdown using InteractiveUtils # This Pluto notebook uses @bind for interactivity. When running this notebook outside of Pluto, the following 'mock version' of @bind gives bound variables a default value (instead of an error). macro bind(def, element) #! format: off quote local iv = try Base.loaded_modules[Base.PkgId(Base.UUID("6e696c72-6542-2067-7265-42206c756150"), "AbstractPlutoDingetjes")].Bonds.initial_value catch; b -> missing; end local el = $(esc(element)) global $(esc(def)) = Core.applicable(Base.get, el) ? Base.get(el) : iv(el) el end #! format: on end # ╔═╡ 8a796500-8722-11eb-27c9-dd7f6d6d18fb begin using LaTeXStrings using PlutoUI using Plots end # ╔═╡ 53885fc8-47a3-4fe7-a575-058f1ba94a81 html""" """ # ╔═╡ 199fe28a-870e-11eb-3e76-a555f3cca82d html"" # ╔═╡ 03008ac2-870e-11eb-0407-758be6a15570 md" # Dynamic Programming 101 Florian Oswald, Carlo Alberto 2025 " # ╔═╡ 413b8300-870e-11eb-3348-f5db37cc4908 md" ## Introduction * Dynamic Programming is a powerful technique for dynamic problems, within and outside of economics. * Some Examples: 1. How to allocate spending over time, given a certain budget? 2. How to position truck drivers over the USA, if demand is random? 3. How to plan mid-air refuelling of military aircraft, if future needs are uncertain? 4. What's the best route for a travelling salesman to visit a number of cities, given a certain road network?" # ╔═╡ bc7c2128-870e-11eb-0807-bd44c9a21425 md" # * Here we will focus on deterministic problems, i.e. no uncertainty. * Let's start with a simple example that probably you have heard of already: Cake Eating. 🍰 * Then we will introduce some notation and terminology. * Finally, we will look at some algorithms you might use to solve this kind of problem." # ╔═╡ 584188c8-870f-11eb-1130-a327b00705f2 md" # Deterministic Dynamic Programming: Piece of 🍰 * Assume you dispose of a given budget $R$ of resources to allocate to spending at $t$ dates. You can think of $R$ Euros, for example. * Let $a_t$ be the *action* you take at date $t$. Here: How much to spend at date $t$. * Let $C_t(a_t)$ the *reward*, the *consumption*, or the *utility* you obtain from that action. * What you see below is typically described as _action = consumption_, hence a _consumption-savings problem_, but let's keep it general and use _action_ instead. " # ╔═╡ 6cb6e2f2-8710-11eb-0452-470650c6beb0 md" ## A *Budgeting Problem* * Our aim is to *maximize* our total utility: $$\max_a \sum_{t\in\mathcal{T}} C_t(a_t)$$ * But we cannot spend more than what we have in terms of resources, so this is subject to constraint $$\sum_{t\in\mathcal{T}} a_t = R$$ * and we don't allow borrowing: $a_t \geq 0$ " # ╔═╡ ce122584-8710-11eb-04e7-fb89bd1fdf65 md" # * We are looking for the *best* way to decide on actions $a_t$. * So, we want a list of values $a_t$ of what to do in each period $t$ - given a certain $R$! * We want to know this function: $$V_t(R_t) = \text{value of having }R_t\text{ resources left to allocate to dates }t,t+1,...$$ " # ╔═╡ 4d554b78-8711-11eb-10b7-11c968ce6a04 md" ## States and Transitions * The value of resources evolves as $$R_{t+1} = R_t - a_t$$ which is called the *transition function* * We call $R_t$ the *state variable*. It encodes all the relevant information to model the system *forward, starting in $t$*. * A more general way of writing the above would be $$R_{t+1} = g(R_t,a_t) = R_t - a_t$$ " # ╔═╡ 10c41fee-8712-11eb-0048-d5caa9c92210 md" ## Optimality Condition and Bellman Equation * The relationship between $V_t(R_t)$ and $V_{t+1}(R_{t+1})$ is then $$V_t(R_t) = \max_{0 \leq a_t \leq R_t} \left[ C_t(a_t) + V_{t+1}(g(R_t,a_t)) \right]$$ * This says: > The value of having $R_t$ resources left in period $t$ is the value of optimizing current spending (the $\max C_t(a_t)$ part) plus the value of then having $g(R_t,a_t)$ units left going forward * This equation is commonly known as the _Bellman Equation_. * Great. Well given $R_t$ we could just try out different $a_t$ and see which gives the highest value. But...what on earth is $V_{t+1}(\cdot)$? 🤔 " # ╔═╡ 1d4e0706-8713-11eb-33d8-b1e6568595af md" ## Finite Time Backward Induction * One simple way is to say that time is finite, stops at $T$, and the value of leaving resources is zero: $V_{T+1}(R) = 0$. * That makes for the last period $$V_T(R_T) = \max_{0 \leq a_t \leq R_t} C_T(a_T)$$ for *any* value $R_T$. That's a *function* right there. * But what is the value of that function? What is $$\max_{0 \leq a_t \leq R_t} C_T(a_T)?$$ # * Ok, well then the preceding period is easy. For whatever $R_{T-1}$ we enter the period, and for whatever is left after action $a_{T-1}$, we know *the future*: $$V_{T-1}(R_{T-1}) = \max_{0 \leq a_{T-1} \leq R_{T-1}} \left[ C_{T-1}(a_{T-1}) + V_{T}(g(R_{T-1},a_{T-1})) \right]$$ * Of course this will work all the way to period 1. If $a_t$ is discrete, we do not need *any* assumptions on $C$ other than finitness - what we are proposing here will be optimal. " # ╔═╡ 0372f860-8712-11eb-20d3-43c71bf6f024 md" ## Discrete Action Space, Finite Time * Let's try that out: * we have $C_t(a_t) = \sqrt{a_t}$ * and $V_{T+1}(R) = 0$ * so its optimal to consume whatever is left. There is not even any maximization problem to solve. " # ╔═╡ cefec74c-8722-11eb-22d9-7551eeb4cc96 md" # " # ╔═╡ a82db4cc-8801-4cdd-a96a-7c0d9fb1c617 # ╔═╡ 082cc9ac-87f9-11eb-180a-09f2fbd87d25 md" # " # ╔═╡ d4854ff4-8722-11eb-1ecf-89fd39b87db9 md" # * ok, now we have $V_T$. * Let's go back one period now. " # ╔═╡ 636d55f6-8725-11eb-2f78-691fc074cb77 md" # " # ╔═╡ 8822b3a8-8725-11eb-149e-25a52208834d md" # * and here are the corresponding optimal actions for both periods: " # ╔═╡ 6b42dfd0-8725-11eb-1f90-eb215d9a4daf md" # * great. now let's go all the way to period 1! " # ╔═╡ 768af538-8725-11eb-2202-4f8bcf65587d # period T-1 till 1 # each period is the same now! # that calls for a function! """ Vperiod(grid::Vector,vplus::Vector) Given a grid and a next period value function `vplus`, calculate current period optimal value and actions. """ function Vperiod(grid,vplus::Vector) points = length(grid) w = zeros(points) # temporary vector for each choice or R' Vt = zeros(points) # optimal value in T-1 at each state of R ix = 0 # optimal action index in T-1 at each state of R at = zeros(points) # optimal action in T-1 at each state of R for (ir,r) in enumerate(grid) # for all possible R-values # loop over all possible action choices for (ia,achoice) in enumerate(grid) if r <= achoice # check whether that choice is feasible w[ia] = -Inf else w[ia] = sqrt(achoice) + vplus[ir - ia] # value of that achoice end end # find best action Vt[ir], ix = findmax(w) # stores Value und policy (index of optimal choice) at[ir] = grid[ix] # record optimal action level end return (Vt, at) end # ╔═╡ fc829384-8726-11eb-358e-5dffd3f96f6f md" # * Let's just call `Vperiod` for all periods $T-2,\dots,1$, going backwards. * By the way, the function `Vperiod` is also called the *Bellman Opterator*: It performs the operation on the RHS of the Bellman Equation!" # ╔═╡ 24bc7018-8727-11eb-315d-ad2f90883d5a function backwards(grid, nperiods) points = length(grid) V = zeros(nperiods,points) a = zeros(nperiods,points) V[end,:] = sqrt.(grid) # from before: final period a[end,:] = collect(grid) for it in (nperiods-1):-1:1 x = Vperiod(grid, V[it+1,:]) V[it,:] = x[1] a[it,:] = x[2] end return (V,a) end # ╔═╡ 18a59a1a-8728-11eb-3d1e-b3a00283280c md" # " # ╔═╡ dd0257b2-8724-11eb-3e68-8791b59f85f4 @bind highR Slider(2:200,show_value = true, default = 20) # ╔═╡ ac31d660-8721-11eb-226b-49124f9093de begin # final period T points = 500 lowR = 0.0001 # highR = 10.0 # slider below Rspace = range(lowR, stop = highR, length = points) aT = Rspace # consume whatever is left VT = sqrt.(aT) # utility of that consumption end # ╔═╡ 9467efca-8722-11eb-12ff-7f5446bcf618 function plotVT() plot(Rspace, VT, xlab = "R", ylab = "Value",label = L"V_T", m = (:circle), leg = :bottomright) end # ╔═╡ f495704e-8729-11eb-3865-cfe53d7ecb4e plotVT() # ╔═╡ 1c5a5cc6-8760-11eb-1c63-030f77621098 begin # period T-1 # now for each value in Rspace, we need to decide how much to consume w = zeros(points) # temporary vector for each choice or R' VT_1 = zeros(points) # optimal value in T-1 at each state of R ix = 0 # optimal index of action in T-1 at each state of R aT_1 = zeros(points) # optimal action in T-1 at each state of R for (ir,r) in enumerate(Rspace) # for all possible R-values # loop over all possible action choices for (ia,achoice) in enumerate(Rspace) if r <= achoice # check whether that choice is feasible w[ia] = -Inf else rprime = ir - ia # state transition w[ia] = sqrt(achoice) + VT[rprime] # value of that achoice end end # find best action VT_1[ir], ix = findmax(w) # stores Value und policy (index of optimal choice) aT_1[ir] = Rspace[ix] # record optimal action level end end # ╔═╡ df006604-8723-11eb-0d67-17b8090add25 let pv = plotVT() plot!(pv, Rspace, VT_1, label = L"V_{T-1}", m = (:star)) end # ╔═╡ 96f139f4-8725-11eb-3e96-596719146475 plotaT() = plot(Rspace, aT ,label = L"a_T",leg = :topleft,ylab = "action",xlab = "R") # ╔═╡ 36c341e8-8726-11eb-2bd5-cf97efd66410 let pa = plotaT() plot!(pa, Rspace, aT_1, label = L"a_{T-1}") end # ╔═╡ cc9e9b2c-8762-11eb-10a8-45e5052c0ccd @bind nperiods Slider(2:20,show_value = true, default = 5) # ╔═╡ 04d66cda-8728-11eb-13bb-6b97b660bb42 V,a = backwards(Rspace, nperiods); # ╔═╡ 301ee980-8728-11eb-200b-2dfe31f59818 let cg = cgrad(:viridis) cols = cg[range(0.0,stop=1.0,length = nperiods)] pv = plot(Rspace, V[1,:], xlab = "R", ylab = "Value",label = L"V_1",leg = :bottomright, color = cols[1]) for it in 2:nperiods plot!(pv, Rspace, V[it,:], label = L"V_{%$(it)}", color = cols[it]) end pv end # ╔═╡ 521cc474-872a-11eb-16c6-cb896e828195 md" # * same for optimal actions: " # ╔═╡ 5b9f63d0-872a-11eb-292c-5d0f3e2756da let cg = cgrad(:viridis) cols = cg[range(0.0,stop=1.0,length = nperiods)] pa = plot(Rspace, a[1,:], xlab = "R", ylab = "Action",label = L"a_1",leg = :topleft, color = cols[1]) for it in 2:nperiods plot!(pa, Rspace, a[it,:], label = L"a_{%$(it)}", color = cols[it]) end pv = plot(Rspace, V[1,:], xlab = "R", ylab = "Value",label = L"V_1",leg = :bottomright, color = cols[1]) for it in 2:nperiods plot!(pv, Rspace, V[it,:], label = L"V_{%$(it)}", color = cols[it]) end plot(pv,pa, layout = (1,2)) end # ╔═╡ 60cca6fe-8763-11eb-2845-db08468ce26a md" # Interpretation * Does that solution make *sense*? * What does the optimal policy prescribe to do, at a certain level of $R$, in each period? " # ╔═╡ 98270f9a-8763-11eb-0850-578a84382a35 bar(1:nperiods,a[:, end], leg = false, title = "Given R_t = $(Rspace[end]), take action...",xlab = "period") # ╔═╡ 89680362-8761-11eb-3dd6-4d217063c82a md" # An Equivalent Formulation * Let's slightly change the problem: $$V_{T-1}(R_{T-1}) = \max_{R'