Reviewing RL concepts
In a way, RL can be defined as learning from rewards. Instead of getting the feedback for every data instance, as is the case with supervised learning, the feedback is received after a sequence of actions. Figure 11.1 shows the high-level schematic of an RL system:
Figure 11.1: RL schematic
In an RL setting, we usually have an agent, which does the learning. The agent learns to make decisions and take actions according to these decisions. The agent operates within a provided environment. This environment can be thought of as a confined world where the agent lives, takes actions, and learns from its actions. An action here is simply the implementation of the decision the agent makes based on what it has learned.
We mentioned earlier that unlike supervised learning, RL does not have an output for each and every input; that is, the agent does not necessarily receive explicit feedback for each and every action. Instead, the agent works in states...