Deepmind marked the year 2017 by creating the best Go player in the world. How did they achieve this? With deep learning, of course, but more precisely with reinforcement learning.
Deep Blue beat human chess players with traditional game analysis. It would build a tree of possible outcomes and prune it with different strategies (like alpha/beta, but adapted to the space of possible outcomes of chess). But this was not possible with Go, which was never solvable by computers until Deepmind created their network and its training methods. Because without training, the network is useless!
In this chapter, we will do the following:
- Look at different types of reinforcement learning
- Explore the concept of Q-learning
- Estimate a Q function via a table and via a neural network
- Make a network play an Atari game using Q-learning