Advanced Policy Estimation Algorithms
In this chapter, we'll complete our exploration of the world of Reinforcement Learning (RL), focusing our attention on complex algorithms that can be employed to solve difficult problems. The topic of RL is extremely large, and we couldn't cover it in its entirety even if we dedicated an entire book to it; this chapter is instead based on many practical examples that you can use as a basis to work on more complex scenarios.
The topics that will be discussed in this chapter are:
- The TD() algorithm
- Actor-Critic TD(0)
- SARSA
- Q-learning, including a simple visual input and a neural network
- Direct policy search through policy gradient
We can now start analyzing the natural extension of TD(0) algorithm, which helps take into account a longer sequence of transitions, obtaining a more accurate estimation of the value function.