The AlphaGo Zero method
In late 2017, DeepMind published an article titled Mastering the game of Go without human knowledge in the journal Nature by Silver et al. [SSa17] presenting a novel approach called AlphaGo Zero, which was able to achieve a superhuman level of playing complex games, like Go and chess, without any prior knowledge except the rules. The agent was able to improve its policy by constantly playing against itself and reflecting on the outcomes. No large game databases, handmade features, or pretrained models were needed. Another nice property of the method is its simplicity and elegance.
In the example of this chapter, we will try to understand and implement this approach for the game Connect 4 (also known as “four in a row” or “four in a line”) to evaluate it ourselves.
First, we will discuss the structure of the method. The whole system contains several parts that need to be understood before we can implement them.
...