Connect 4 with MuZero
Now that we have discussed the method, let’s check its implementation and the results in Connect 4. The implementaton consists of several modules:
-
lib/muzero.py: Contains MCTS data structures and functions, neural networks, and batch generation logic
-
train-mu.py: The training loop, implementing self-play for episode generation, training, and periodic validation of the currently trained model versus the best model (the same as the AlphaGo Zero method)
-
play-mu.py: Performs a series of games between the list of models to get their rankings
Hyperparameters and MCTS tree nodes
Most MuZero hyperparameters are put in a separate dataclass to simplify passing them around the code:
@dataclass
class MuZeroParams:
actions_count: int = game.GAME_COLS
max_moves: int = game.GAME_COLS * game.GAME_ROWS ...