- Would you use a model-based or a model-free algorithm if you had only 10 games in which to train your agent to play checkers?
- What are the disadvantages of model-based algorithms?
- If a model of the environment is unknown, how can it be learned?
- Why are data aggregation methods used?
- How does ME-TRPO stabilize training?
- How does using an ensemble of models improve policy learning?




















































