- How would you rank DQN, A2C, and ES based on their sample efficiency?
- What would their rank be if they were rated on the training time and 100 CPUs were available?
- Would you start debugging an RL algorithm on CartPole or MontezumaRevenge?
- Why is it better to use multiple seeds when comparing multiple deep RL algorithms?
- Does the intrinsic reward help with the exploration of an environment?
- What's transfer learning?