Upper bound confidence
The UCB approach assumes that the expected reward of an action is linearly dependent on the d-dimension context.
Confidence interval
Intuitively, the confidence on the reward for a given arm increases as the arm is played. The variance of the reward is significantly high when the arm has been rarely played. The variance or confidence interval symbolizes the uncertainty on the reward of the arm. As the arm gets played, the confidence interval decreases.
The goal of the exploration is to play arms with a large confidence interval around the mean value of their reward so they can be a potential candidate for exploitation.
The following diagram illustrates the process of exploration [14:7]:
The exploration phase favors the arm i being played to reduce its confidence interval. The exploration phase uses arm j because it has the highest mean reward with a very small confidence factor.
The...