Continuing with the bandit algorithms, we will explore an improvement to the UCB1 algorithm, called regret matching. We will use the same case of playing rock-paper-scissors, but it can be repurposed for other types of games, such as fighting games.
Implementing regret matching
Getting ready...
It's important to have read the previous recipe and to have taken into account the member variables and data structures. The member functions are not relevant for the purpose of this algorithm as we will implement a different set to have a different recipe, but it's based on the knowledge gained previously.