Dynamic programming
Dynamic programming (DP) is a technique for solving complex problems. In DP, instead of solving a complex problem as a whole, we break the problem into simple sub-problems, then for each sub-problem, we compute and store the solution. If the same subproblem occurs, we don't recompute; instead, we use the already computed solution. Thus, DP helps in drastically minimizing the computation time. It has its applications in a wide variety of fields including computer science, mathematics, bioinformatics, and so on.
Now, we will learn about two important methods that use DP to find the optimal policy. The two methods are:
- Value iteration
- Policy iteration
Note that dynamic programming is a model-based method meaning that it will help us to find the optimal policy only when the model dynamics (transition probability) of the environment are known. If we don't have the model dynamics, we cannot apply DP methods.
The...