Training a cheetah bot to run using PPO
In this section, let's learn how to train the 2D cheetah bot to run using Proximal Policy Optimization (PPO). First, import the necessary libraries:
import gym
from stable_baselines.common.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv, VecNormalize
from stable_baselines import PPO2
Create a vectorized environment using DummyVecEnv
:
env = DummyVecEnv([lambda: gym.make("HalfCheetah-v2")])
Normalize the state:
env = VecNormalize(env,norm_obs=True)
Instantiate the agent:
agent = PPO2(MlpPolicy, env)
Train the agent:
agent.learn(total_timesteps=250000)
After training, we can see how our trained cheetah bot learned to run by rendering the environment:
state = env.reset()
while True:
action, _ = agent.predict(state)
next_state, reward, done, info = env.step(action)
state = next_state
env.render()
Save the whole code used in this...