Let's go through a brief comparison of some popular action selection strategies. We'll focus on a few in particular:
- Greedy strategy
- Epsilon-greedy strategy
- Upper confidence bound
Outside the AI space, reinforcement learning is often referred to as dynamic programming. Upper confidence bound is a strategy often used in the dynamic programming space in fields such as economics. It is based on the principle of optimism in the face of uncertainty and places a high priority on exploration.
Using upper confidence bound, we assume we are better off exploring our environment as much as we can and presuming that paths we have not seen will lead to high rewards. We'll see how this works in the following strategy selection sections.