The question list is as follows:
- What is an MAB problem?
- What is an explore-exploit dilemma?
- What is the significance of epsilon in the epsilon-greedy policy?
- How do we solve an explore-exploit dilemma?
- What is a UCB algorithm?
- How does Thompson sampling differ from the UCB algorithm?