Using function approximation for action
In our online advertising examples so far, we have assumed to have a fixed set of ads (actions/arms) to choose from. However, in many applications of contextual bandits, the set of available actions change over time. Take the example of a modern advertising network that uses an ad server to match ads to websites/apps. This is a very dynamic operation which involves, leaving the pricing aside, three major components:
- Website/app content,
- Viewer/user profile,
- Ad inventory.
Previously, we considered only the user profile for the context. An ad server needs to take the website/app content into account additionally, but this does not really change the structure of problem we solved before. However, now, we cannot use a separate model for each ad since the ad inventory is dynamic. We handle this by using a single model to which we feed ad features. This is illustrated in Figure 5.