Warm starts with demonstrations
A popular technique to demonstrate the agent a way to success is to train it on data that is coming from a reasonably successful controller, such as humans. In RLlib, this can be done via saving the human play data from the mountain car environment:
Chapter10/mcar_demo.py
... new_obs, r, done, info = env.step(a) # Build the batch batch_builder.add_values( t=t, eps_id=eps_id, agent_index=0, obs=prep.transform(obs), ...