Another popular method for reinforcement learning is Q-learning. In Q-learning, we don't focus on mapping an observation to a specific action, but we try to assign some value to the current state (of observations) and act based on that value. The states and actions can be seen as a Markov decision process, where the environment is stochastic. In a Markov process, the next state only depends on the current state and the following action. So, we assume that all previous states (and actions) are irrelevant.
The Q in Q-learning stands for quality; the function Q(s, a) provides a quality score for action a in state s. The function can be of any type. In a simple form, it can be a lookup table. However, in a more complex environment, this won't work and that's where deep learning comes in place. In the following recipe, we will...