Using function approximation for context
Function approximations allow us to model the dynamics of a process from which we have observed data, such as contexts and ad clicks. As in the previous chapter, consider an online advertising scenario with five different ads (i.e. A, B, C, D, and E), with the context comprised of user device, location and age. In this section, our agent will learn five different Q functions, one per ad, each receiving a context , and return the action value estimate. This is illustrated in Figure 1.
At this point, we have a supervised machine learning problem to solve for each action. We can use different models to obtain the Q functions, such as logistic regression or a neural network (which actually allows us to use a single network that estimates values for all actions). Once we choose the type of function approximation...