Q-learning is one of the most used reinforcement learning algorithms. This is due to its ability to compare the expected utility of available actions without requiring an environment model. Thanks to this technique, it is possible to find an optimal action for every given state in a finished MDP.
Keras DQNs
Q-learning
A general solution to the reinforcement learning problem is to estimate, thanks to the learning process, an evaluation function. This function must be able to evaluate, through the sum of the rewards, the optimality/utility or otherwise of a particular policy. In fact, Q-learning tries to maximize the value of the Q function (action-value function), which represents the maximum discounted future reward when we...