Summary
In this chapter we studied the Q-learning model, which is only applied to environments that have a finite number of input states and a finite number of possible actions to perform.
When performing Q-learning, the AI learns Q-values through an iterative process, so that the higher the Q-value of a (state, action) pair, the closer the AI gets to the top reward.
At each iteration the Q-values are updated through the Bellman equation, which simply consists of adding the temporal difference, discounted by a learning rate factor. We will get to work on a full practical Q-learning activity in the next chapter, applied to a real-world business problem.