Reinforcement learning algorithms
In this section, we will cover a series of learning algorithms. We will start with dynamic programming, which assumes that the transition dynamics—or the environment dynamics, that is, —are known. However, in most RL problems, this is not the case. To work around the unknown environment dynamics, RL techniques were developed that learn through interacting with the environment. These techniques include Monte Carlo (MC), temporal difference (TD) learning, and the increasingly popular Q-learning and deep Q-learning approaches.
Figure 19.5 describes the course of advancing RL algorithms, from dynamic programming to Q-learning:
Figure 19.5: Different types of RL algorithms
In the following sections of this chapter, we will step through each of these RL algorithms. We will start with dynamic programming, before moving on to MC, and finally on to TD and its branches of on-policy SARSA (state–action–reward–...