AlphaGo Zero
We will now continue our discussion about model-based methods by exploring the cases when we have a model of the environment, but this environment is being used by two competing parties. This situation is very familiar in board games, where the rules of the game are fixed and the full position is observable, but we have an opponent who has the primary goal of preventing us from winning the game.
Recently, DeepMind proposed a very elegant approach to solving such problems. No prior domain knowledge is required, but the agent improves its policy only via self-play. This method is called AlphaGo Zero.
In this chapter, we will:
- Discuss the structure of the AlphaGo Zero method
- Implement the method for playing the game Connect 4