Reinforcement learning represents a family of algorithms that are able to learn and adapt to environmental changes. It is based on the concept of receiving external stimuli based on the choices of the algorithm. A correct choice will result in a reward, while a wrong choice will lead to a penalty. The goal of the system is to achieve the best possible result.
In supervised learning, the correct output is clearly specified (learning with a teacher). But it is not always possible to do so. Often, we only have qualitative information. The information that's available is called a reinforcement signal. In these cases, the system does not provide any information on how to update the agent's behavior (for example, weights). You cannot define a cost function or a gradient. The goal of the system is to create the smart agents that are able to learn from their experience...