These exercises are here for you to use and learn from. Attempt at least 2-3, and the more you do, the easier later chapters will also be:
- What is the difference between an online and offline policy agent?
- Tune the hyperparameters for any or all of the examples in this chapter, including the new hyperparameter, lambda.
- Change the discretization steps in any example that uses discretization and see what effect it has on training.
- Use example Chapter_5_3.py, SARSA(0), and adapt it to another Gym environment that uses a continuous observation space and discrete action space.
- Use example Chapter_5_4.py, SARSA(λ), and adapt it to another Gym environment that uses a continuous observation space and discrete action space.
- There is a hyperparameter shown in the code that is not used. Which parameter is it?
- Use example Chapter_5_5.py, SARSA(λ), Lunar Lander and optimize...