The reading note of first 8 chapters of *The Art and Theory of Dynamic Programming*.

Dynamic Programming cannot be passively teached, but

*actively*learned by*considerable practice*with solving problmems*on one's own*.Why dynamic programming is more

*art*than*science*we do not know. But*common sense*, not mathematical manipulation, is what it takes to be successful in a course in dynamic programming.- Goals:
- Primary: identify
*state*and*stage*variables, and define and characterize the*optimal value function*. - Secondary: use DP analytically to establish the
*structure*of the optimal solution and the necessary conditions. - Final: special techniques.

- Primary: identify
- Chapters
- 1-8: deterministic
- 9-14: stochastic
- 15: learning related
- When used in undergraduate classes: 1-4, 6(1-4), 9-11, 13, 15(1-5).

Definition: optimization procedure that is particularly applicable to problems requiring a sequence of interrelated decisions.

Sequence of

*decisions*to sequence of*situations*, to*optimize*some measure of*value*.

- Problem: one-way street, Manhattan-style city, from A to B.

Define sub-problems, solve the original problem by using the results from sub-problems.

So we can solve all the sub-problems recursively in some order.

- Define:
**optimal value function**: the rule that assigns values to various subproblems, it has**arguments**.**optimal policy function**: the rule that associats the best first decision with each subproblem.**recurrence relation**: a formula or set of formulas relating various values of.*OVF***boundary conditions**.

- Procedure:
- To solve a problem by dynamic programming, we choose the arguments of the optimal value function and define that function to allow the use of the principle of optimality to write a recurrence relation.
- Starting with the boundary conditions, we then use the recurrence relation to determine concurrently the optimal value and policy functions.
- When the optimal value and decision are known for the value of the argument that represents the original whole problem, the solution is completed and the best path can be traced out using the optimal policy function alone.

- The time complexity of previous problem is
`$O(N^2)$`

.

- We can reverse the viewpoint of start point and end point, to obtain another representation of sub-problems. This is called
.*forward*dynamic programming procedure

- Problem: when turning in path requires additional cost.

- Solution: add one
*argument*to the*optimal value function*.

The art of dynamic-programming formulation involves the proper choice of the arguments for the optimal value function.

A good way to determin the right amount of arguments of

*OVF*is to think as "asking the consultant question."

- In
*OVF*, the*monotonic*argument is called the, all the others are called*stage*variale.*state*variables

Usually, the cost of a certain step

*do not depend*on the stage/time.The doubling-up procedure is that the stage

`$M+N$`

depends on stage`$M$`

and stage`$N$`

.

In

`N`

time periods, we originally have a machine with age`y`

.`c(i)`

is the cost of operating a machine with age`i`

;`p`

is price of a new machine;`t(i)`

is trade-in value for machine with age`i`

;`s(i)`

is the salvage value of machine with age`i`

at the end of`N`

time periods.The problem is to minimize the total cost during the next

`N`

time periods.

- The optimal value function is trivial and left as exercises.

- Use
*age of machine*as y-axis and*time*as x-axis, we can represent the problem as a shortest-path problem.

**TODO**

Added the third choice between "keep" and "replace": "overhaul".

And other variants in exercises.

Not really

*dynamic*(i.e. requiring a*sequence*of decisions).`X`

units of a resourse to be distributed among`N`

activities. Each activity has a return function`r_i(x)`

. How to allocate the resourse to maximize the total return?

- To use dynamic programming, we view the solution set as stages.

This problem can be represent as a longest-path problem.

- Variants:
- forward DP procedure
- return function not monotonic
- doubling-up procedure when activities have the same return function
- two- or three-dimentional resource
- additional requirement: total allocation for a subset have a lower limit
- additional requirement: at most
`M`

of the allocations may be nonzero - additional requirement: if
`k`

activities are nonzero, a cost`h(k)`

is assessed. - two kinds of return: money and public goodwill, maximize money while goodwill has a lower limit.

Two kinds of resource which have unit costs, initially given money, how to buy resources and allocate them optimally.

Variant: three kinds of resource.

- TODO: it's so brand new...

- TODO: skipped

- TODO: skipped

- TODO: skipped

- TODO: skipped

- Shortest path in a general directed graph.

Graphs with no cycle.

Topologic order: DP stages.

Can find not only shortest distance but also longest one.

No natural ordering.

Dijkstra algorithm: path length as DP order/stages.

Bellman-Ford:

`k`

iterations determine the shortest path when`k`

or fewer arcs must be used.- Yen's algorithm:
- reversal: path
`1-5-2-3-4`

has two reversals at`5`

and`2`

. - the
`i`

th iteration find the length of the shortest path from node`1`

to node`k`

among all paths have`i-1`

or fewer reversals. - if the
`i`

th iteration does not update for some node, then it's converged; if convergence does not occur by`N`

th iteration, negative cycle exists. - most efficient.

- reversal: path
- All-pair shortest paths
- Floyd
- Dantzig: similar but longer to write

- Variants
- value of path determined by the max/min one arc
- value of path are product of values on arc
- travel time on an arc determined by the start time
- value of arc determined by the previous traveled arc
- Dijkstra-like algorithm connecting two disjoint subset

- Definition: a minimum-cost tour traveling all
`N`

vertexes and returning to start.

`f_i(j,S)`

= the length of the shortest path from city`1`

to city`j`

via the set of`i`

intermediate cities`S`

Complexity

`N^2 * 2^N`

.

When the distance matrix of graph is a symmetric, assume

`N`

is even, we can solve all`f_i(j,S)`

where`i <= (N-2)/2`

, then combine complement two of them.Significantly reduced computing time.

Variant 1: without returning to city 1.

Variant 2: visit other vertexes at least once, not exactly once.

- Exercises:
- diffferent variants of values on arc
- adding requirements "city
`i`

must preced`j`

" - use
`O(2^N)`

Dijkstra-like labels to solve the original problem

Problems with a struture different from, but analogous to, the common structure of all linear-programming problems.

Mathematical analysis leads to routine solution of problems with dozens of state variables.

Newton-Raphson procedure to find the minimum/maximum of a function.

*Linear dynamic system*:`x(i)`

state at stage`i`

`y(i)`

the decision`x(i+1)`

=`g(i)x(i) + h(i)y(i)`

*Quadratic Creterion*`J`

=`Σ[a(i)x^2(i)+c(i)y^2(i)] + lx^2(N)`

- Minimize
`J`

TODO: not read.

TODO: not read.

TODO: not read.

- TODO: not read

- Complete pack
- Item
`i`

- Weight of item
`w_i`

- Value of item
`v_i`

- Capacity
`w`

- Item

Graph with

`0..w`

as vertexes.One kind of item is one kind of arc.

Find longest path in the graph.

- Slightly improved version of algorithm 1, solution has unique representation, do not calculate one solution multiple times.

*Forward-looking*procedure, use permanent (optimal) value of`f(w)`

to compute temporary values of`f(w)`

for larger`w`

.

- Use
*breakpoints*.