What is a MAB?
A MAB problem is all about identifying the best action among a set of actions available to an agent through trial and error, such as figuring out the best look for a website among some alternatives, or the best ad banner to run for a product. We will focus on the more common variant of MABs where there are discrete actions available to the agent, also known as -armed bandit problem.
Let's define the problem in more detail through the example it got its name from.
Problem definition
The MAB problem is named after the case of a gambler who needs to choose a slot machine (bandit) to play in a row of machines:
- When the lever of a machine is pulled, it gives a random reward coming from a probability distribution specific to that machine.
- Although the machines look identical, their reward probability distributions are different.
The gambler is trying to maximize his total reward. So, in each turn, he needs to decide whether to play the...