Actor-critic using Kronecker-factored trust region
ACKTR, as the name suggests, is the actor-critic algorithm based on the Kronecker factorization and trust region.
We know that the actor-critic architecture consists of the actor and critic networks, where the role of the actor is to produce a policy and the role of the critic is to evaluate the policy produced by the actor network. We learned that in the actor network (policy network), we compute gradients and update the parameter of the actor network using gradient ascent:
Instead of updating our actor network parameter using the preceding update rule, we can also update it by computing the natural gradients as:
Where F is called the Fisher information matrix. Thus, the natural gradient is just the product of the inverse of the Fisher matrix and standard gradient:
The use of the natural gradient is that it guarantees a monotonic improvement in the policy. However, updating the actor network (policy...