Introduction
In the previous chapter, we were introduced to the OpenAI Gym environment and also learned how to implement custom environments, depending on the application. You also learned the basics of TensorFlow 2, how to implement a policy using the TensorFlow 2 framework, and how to visualize learning using TensorBoard. In this chapter, we will see how Dynamic Programming (DP) works in general, from a computer science perspective. Then, we'll go over how and why it is used in RL. Next, we will dive deep into classic DP algorithms such as policy evaluation, policy iteration, and value iteration and compare them. Lastly, we will implement the algorithms in the classic coin-change problem.
DP is one of the most fundamental and foundational topics in computer science. Furthermore, RL algorithms such as Value Iteration, Policy Iteration, and others, as we will see, use the same basic principle: avoid repeated computations to save time, which is what DP is all about. The philosophy...