Further reading
For more information, refer to the following papers:
- Trust Region Policy Optimization by John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel, https://arxiv.org/pdf/1502.05477.pdf
- Proximal Policy Optimization Algorithms by John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov, https://arxiv.org/pdf/1707.06347.pdf
- Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation by Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba, https://arxiv.org/pdf/1708.05144.pdf