Challenges and future directions
You could be wondering why we are back to talking about RL challenges after finishing an advanced-level book on this topic. Indeed, throughout the book, we presented many approaches to mitigate them. On the other hand, we cannot claim these challenges are solved. So, it is important to call them out and discuss the future directions for each in a concise list to give you a mental map and a compass to navigate through them.
Let's start our discussion with one of the most important challenges: Sample efficiency.
Sample efficiency
As you are now well aware, it takes a lot of data to train an RL model. OpenAI Five, who became a world-class player in the strategy game Dota 2, took 128,000 CPUs and 256 CPUs to train, over many months, collecting a total of 900 years' worth of game experience per day (OpenAI, 2018). RL algorithms are benchmarked on their performances after trained over 10 billion Atari frames (Kapturowski, 2019). This is...