As we have already said, reinforcement learning is a programming philosophy that aims to create algorithms able to learn and adapt to changes in the environment. This programming technique is based on the assumption of being able to receive stimuli from the outside according to the choices of the algorithm. So, a correct choice will result in a prize, while an incorrect choice will lead to a penalization of the system. The goal of the system is to achieve the highest-possible prize and consequently the best-possible result.
With such a model, the computer learns, for example, to beat an opponent in a game (or to drive a vehicle) concentrating its efforts on performing a given task, aiming to achieve the maximum reward value; in other words, the system learns by playing (or driving) and by the mistakes made improving performance...