ε-greedy actions
An easy to implement, effective and widely used approach to exploration-exploitation problem is what is called ε-greedy actions. This approach suggests, most of the time, greedily taking the action that is the best according to the rewards observed that far in the experiment (i.e. with 1-ε probability); but once in a while (i.e. with ε probability) take a random action regardless of the action performances. Here ε is a number between 0 and 1, usually closer to zero (e.g. 0.1) to "exploit" in most decisions. This way, the method allows continuous exploration of the alternative actions throughout the experiment.
Application to the online advertising scenario
Now, let's implement the ε-greedy actions to the online advertising scenario that we have.
- We start with initializing the necessary variables for the experiment, which will keep track of the action value estimates, number of times each ad has been...