Training tic-tac-toe agents through self-play
In this section, we will provide you with some key explanations of the code in our Github repo to get a better grasp of MARL with RLlib while training tic-tac-toe agents on a 3x3 board. For the full code, you can refer to https://github.com/PacktPublishing/Mastering-Reinforcement-Learning-with-Python.
Let's started with designing the multi-agent environment.
Designing the multi-agent tic-tac-toe environment
In the game, we have two agents, X and O, playing the game. We will train four policies for the agents to pull their actions from, and each policy can play either an X or O. We construct the environment class as follows:
Chapter09/tic_tac_toe.py
class TicTacToe(MultiAgentEnv): def __init__(self, config=None): &...