6. Policy Gradient methods using Keras
The four policy gradient methods (Algorithm 10.2.1 to Algorithm 10.5.1) discussed in the previous sections use identical policy and value network models. The policy and value networks in Figure 10.2.1 to Figure 10.4.1 have the same configurations. The four policy gradient methods differ only in:
- Performance and value gradient formulas
- Training strategy
In this section, we will discuss the implementation in tf.keras
of the common routines of Algorithm 10.2.1 to Algorithm 10.5.1 in one code.
The complete code can be found at https://github.com/PacktPublishing/Advanced-Deep-Learning-with-Keras.
But before discussing the implementation, let's briefly explore the training environment.
Unlike Q-learning, policy gradient methods are applicable to both discrete and continuous action spaces. In our example, we'll demonstrate the four policy gradient methods on a continuous action space...