Why we need function approximations
While solving (contextual) multi-armed bandit problems, our goal is to learn action values for each arm (action) from our observations, which we have denoted by . In the online advertising example, it represented our estimate for the probability of a user clicking the ad if we displayed . Now, assume that we have two pieces of information about the user seeing the ad, namely:
- Device type (e.g. mobile vs. desktop), and
- Location (e.g. domestic / U.S. vs. international / non-U.S.)
It is quite likely that ad performances will differ with device type and location, which make up the context in this example. A CB model will therefore leverage this information, estimate the action values for each context, and choose the actions accordingly.
This would look like filling a table for each ad similar to the below:
This means solving four MAB problems, one for...