Dynamic programming
Dynamic programming is a sequential way of solving complex problems by breaking them down into sub-problems and solving each of them. Once it solves the sub-problems, then it puts those subproblem solutions together to solve the original complex problem. In the reinforcement learning world, Dynamic Programming is a solution methodology to compute optimal policies given a perfect model of the environment as a Markov Decision Process (MDP).
Dynamic programming holds good for problems which have the following two properties. MDPs, in fact, satisfy both properties, which makes DP a good fit for solving them by solving Bellman Equations:
- Optimal substructure
- Principle of optimality applies
- Optimal solution can be decomposed into sub-problems
- Overlapping sub-problems
- Sub-problems recur many times
- Solutions can be cached and reused
- MDP satisfies both the properties - luckily!
- Bellman equations have recursive decomposition of state-values
- Value function stores and reuses solutions
Though...