Things to try
Here is a list of things you can do to improve your understanding of the topic:
- In the D4PG code, I used a simple replay buffer, which was enough to get good improvement over DDPG. You can try to switch the example to the prioritized replay buffer in the same way as we did in Chapter 8, DQN Extensions, and check the effect.
- There are lots of interesting and challenging environments around. For example, you can start with other PyBullet environments, but there is also the DeepMind Control Suite (Tassa, Yuval, et al., DeepMind Control Suite, arXiv abs/1801.00690 (2018)), MuJoCo-based environments in Gym, and many others.
- You can request the trial license of MuJoCo and compare its stability, performance, and resulting policy with PyBullet.
- You can play with the very challenging Learning to Run competition from NIPS-2017 (which also took place in 2018 and 2019 with more challenging problems), where you are given a simulator of the human body and your agent...