As we saw in the multi-armed previous chapter, the Multi-Armed Bandit Problem (MABP) can be thought of as lite RL. In the simplest type of problem, we have only actions, rewards, and a probability distribution of reward payouts for each action.
Contextual bandits add a state space or context to the bandit problem, giving us additional information about the environment and providing us with an existing probability distribution for each alternative action we might take, so that we don't have to discover the probability distribution from scratch each time.