MuZero
The successor of AlphaGo Zero (published in 2017) was a method called MuZero, described by Schrittwieser et al. from DeepMind in the paper Mastering Atari, Go, chess and shogi by planning with a learned model [Sch+20] published in 2020. In this method, the authors made an attempt to generalize the method by removing the requirement of the precise game model, but still keeping the method in the model-based family. As we saw in the description of Alpha Go Zero, the game model is heavily used during the training process: in the MCTS phase, we use the game model to obtain the available actions in the current state and the new state of the game after applying the action. In addition, the game model provides the final game outcome: whether we have won or lost the game.
At first glance, it looks almost impossible to get rid of the model from the training process, but MuZero not only demonstrated how it could be done, but has also beaten the previous AlphaGo Zero records in...