Reinforcement Learning (RL) aims to create systems that will learn and, at the same time, adapt to changes in the environment in which they are located, using a reward that is assigned to each action performed.
Software systems that process information in this way are called intelligent agents.
These agents decide to take an action based on the following:
- State of the system
- Learning algorithm used
To change the system state and maximize its long term rewards, and agent selects the action to be performed by continuously monitoring its environment.
To obtain a large reward and, therefore, optimize the Reinforcement Learning procedure, the agent must prefer actions that, in the past, have produced a good reward.
The actions are discovered, proving those never selected first. Therefore, the agent must exploit what it already knows, both to obtain the maximum reward, and also...