Quick recap on reinforcement learning
We first encountered reinforcement learning in Chapter 1, Machine Learning – An Introduction, when we looked at the three different types of learning processes: supervised, unsupervised, and reinforcement. In reinforcement learning, an agent receives rewards within an environment. For example, the agent might be a mouse in a maze and the reward might be some food somewhere in that maze. Reinforcement learning can sometimes feel a bit like a supervised recurrent network problem. A network is given a series of data and must learn a response.
The key distinction that makes a task a reinforcement learning problem is that the responses the agent gives changes the data it receives in future time steps. If the mouse turns left instead of right at a T section of the maze, it changes what its next state would be. In contrast, supervised recurrent networks simply predict a series. The predictions they make do not influence the future values in the series.
The AlphaGo...