Upper confidence bounds applied to trees
To recap, Min-Max gives us the actual best move in a position, given perfect information; however, MCTS only gives an average value; though it allows us to work with much larger state spaces that cannot be evaluated with Min-Max. Is there a way that we could improve MCTS so it could converge to the Min-Max algorithm if enough evaluations are given? Yes, Monte Carlo Tree Search with Confidence bounds applied to Trees (UCT) does exactly this. The idea behind it is to treat MCTS like a multiarmed bandit problem. The multiarmed bandit problem is that we have a group of slot machines—one armed bandits—each of which has an undetermined payout and average amount of money received per play. The payout for each machine is random, but the mean payout may vary significantly. How should we determine which slot machines to play?
There are two factors that need to be considered when choosing a slot machine. The first is the obvious one, an exploitative value, which...