The simplest kind of Multi-Armed Bandit Problem (MABP) is a two-armed bandit. At each iteration, we have a choice of one of two arms to pull, as well as our current knowledge of the payout probability of each arm. We'll go through a demonstration of a two-armed bandit iteration in this section.
As we progress through our investigation, we want to look at our existing knowledge of the probability distribution of the payout for each arm. This will help us determine our betting strategy.
When we first start to investigate the frequency of an unknown event, we start with no information on the likelihood of that event occurring. It's useful to think of the probability distribution we develop over time as our own level of knowledge about that event, and conversely our own ignorance about it.
In other words, we only have the information we...