Reinforcement learning
Reinforcement learning (RL) is not a new idea or technique. The initial idea dates back to the 1950s, when it was introduced by Richard Bellman with the concept of the Bellman equation (Sutton and Barto, 2018). However, its recent combination with human feedback, which we will explain in the next section, provided a new opportunity for its utility in developing machine learning technologies. The general idea of RL is to learn by experience, or interaction with a specified environment, instead of using a collected set of data points for training, as in supervised learning. An agent is considered in RL, which learns how to improve actions to get a greater reward (Kaelbling et al., 1996). The agent learns to improve its approach to taking action, or policy in more technical terminology, iteratively after receiving the reward of the action taken in the previous step.
In the history of RL, two important developments and utilities resulted in an increase in its...