Summary
In this chapter, the MAB problem and its motivation as a reinforcement learning and artificial intelligence problem were introduced. We explored a plethora of algorithms that are commonly used to solve the MAB problem, including the Greedy algorithm and its variants, UCB, and Thompson Sampling. Via these algorithms, we were exposed to unique insights and heuristics on how to balance exploration and exploitation (which is one of the most fundamental components of reinforcement learning) such as random exploration, optimism under uncertainty, or sampling from Bayesian posterior distributions.
This knowledge was put into practice as we learned how to implement these algorithms from scratch in Python. During this process, we also examined the importance of analyzing MAB algorithms over many repeated experiments to obtain robust results. This procedure is integral for any analysis framework that involves randomness. Finally, in this chapter's activity, we applied our knowledge...