In the previous recipe, Model-based RL using MDPtoolbox, we followed a model-based approach to solve an RL problem. Model-based approaches become impractical as the state and action space grows. On the other hand, model-free reinforcement algorithms rely on trial-and-error interaction of the agent with the environment representing the problem in hand. In this recipe, we will use a model-free approach to implement RL using the ReinforcementLearning package in R. This package utilizes a popular model-free algorithm known as Q-learning. It is an off-policy algorithm due to the fact that it explores the environment and exploits the current knowledge at the same time.
Q-learning guarantees to converge to an optimal policy, but to achieve so, it relies on continuous interactions between an agent and its environment, which makes it computationally heavy. This...