Solving the Bellman equation
We can find the optimal policies by solving the Bellman optimality equation. To solve the Bellman optimality equation, we use a special technique called dynamic programming.
Dynamic programming
Dynamic programming (DP) is a technique for solving complex problems. In DP, instead of solving complex problems one at a time, we break the problem into simple sub-problems, then for each sub-problem, we compute and store the solution. If the same sub-problem occurs, we will not recompute, instead, we use the already computed solution. Thus, DP helps in drastically minimizing the computation time. It has its applications in a wide variety of fields including computer science, mathematics, bioinformatics, and so on.
We solve a Bellman equation using two powerful algorithms:
- Value iteration
- Policy iteration
Value iteration
In value iteration, we start off with a random value function. Obviously, the random value function might not be an optimal one, so we look for a new improved...