In March 2016, AlphaGo--the program made by Google's DeepMind--defeated the world's best Go player, 18-time world champion Lee Sedol, by 4 to 1. The match was historic because Go is a notoriously difficult game for computers to play, with:
208,168,199,381,979,984,699,478,633,344,862,770,286,522,
453,884,530,548,425,639,456,820,927,419,612,738,015,378,
525,648,451,698,519,643,907,259,916,015,628,128,546,089,
888,314,427, 129,715,319,317,557,736,620,397,247,064,840,935
possible legal board positions. Playing and winning Go cannot be done by simple brute force. It requires skill, creativity, and, as professional Go players say, intuition.
This remarkable feat was accomplished by AlphaGo with the help of RL algorithm-based deep neural networks combined with a state-of-the-art tree search algorithm. This chapter introduces RL and some algorithms that we employ...