Performing model-free learning
Unlike model-based learning, where dynamics of transitions are explicitly provided (as transition probabilities from one state to another state), in model-free learning, the transitions are supposed to be deduced and learned directly from the interaction between states (using actions) rather explicitly provided. Widely used frameworks of mode-free learning are Monte Carlo methods and the Q-learning technique. The former is simple to implement but convergence takes time, whereas the latter is complex to implement but is efficient in convergence due to off-policy learning.
Getting ready
In this section, we will implement the Q-learning algorithm in R. The simultaneous exploration of the surrounding environment and exploitation of existing knowledge is termed off-policy convergence. For example, an agent in a particular state first explores all the possible actions of transitioning into next states and observes the corresponding rewards, and then exploits current...