Implementing the Proximal Policy Optimization algorithm and PPO agent
The Proximal Policy Optimization (PPO) algorithm builds upon the work of Trust Region Policy Optimization (TRPO) to constrain the new policy to be within a trust region from the old policy. PPO simplifies the implementation of this core idea by using a clipped surrogate objective function that is easier to implement, yet quite powerful and efficient. It is one of the most widely used RL algorithms, especially for continuous control problems. By the end of this recipe, you will have built a PPO agent that you can train in your RL environment of choice.
Getting ready
To complete this recipe, you will first need to activate the tf2rl-cookbook
Conda Python virtual environment and pip install -r requirements.txt
. If the following import statements run without issues, you are ready to get started!
import argparse import os from datetime import datetime import gym import numpy as np import tensorflow as tf from...