We have avoided going too deep into the more advanced inner workings of the proximal policy optimization (PPO) algorithm, even going so far as to avoid any policy-versus-model discussion. If you recall, PPO is the reduced level (RL) method first developed at OpenAI that powers ML-Agents, and is a policy-based algorithm. In this chapter, we will look at the differences between policy-and model-based RL algorithms, as well as the more advanced inner workings of the Unity implementation.
The following is a list of the main topics we will cover in this chapter:
- Marathon reinforcement learning
- The partially observable Markov decision process
- Actor-Critic and continuous action spaces
- Understanding TRPO and PPO
- Tuning PPO with hyperparameters
The content in this chapter is at an advanced level, and assumes that you have covered several previous chapters and exercises...