Here, we'll go over some of the most important concepts that we'll need to bear in mind throughout our study of RL. We'll focus heavily on topics that are specific to Q-learning, but we'll also explore topics relating to other branches of RL, such as the related algorithm SARSA and policy-based RL algorithms.
Key concepts in RL
Value-based versus policy-based iteration
We'll be using value-based iteration for the projects in this book. The description of the Bellman equation given previously offers a very high-level understanding of how value-based iteration works. The main difference is that in value-based iteration, the agent learns the expected reward value of each state-action pair, and in policy...