Reinforcement Learning is based on an interesting psychological theory:
Applying a reward immediately after the occurrence of a response increases its probability of reoccurring, while providing punishment after the response will decrease the probability (Thorndike, 1911).
A reward, received immediately after the execution of a correct behavior, increases the likelihood that this behavior will be repeated; while, following an undesired behavior, the application of a punishment decreases the likelihood of that error reocurring. Therefore, once a goal has been established, Reinforcement Learning seeks to maximize the rewards received, to achieve the designated goal.
RL finds applications in different contexts in which supervised learning is inefficient.
A very short list includes the following:
- Advertising helps in learning rank, using one-shot learning for emerging items, and new users will...