Reinforcement learning is an interactive strategy. An agent navigates through an environment (for example, a robot moving around a room or a video game character going through a level). The agent has a predefined list of actions it can make (walk, turn, jump, and so on) and, after each action, it ends up in a new state. Some states can bring rewards, which are immediate or delayed, and positive or negative (for instance, a positive reward when the video game character touches a bonus item, and a negative reward when it is hit by an enemy).
At each instant, the neural network is provided only with observations from the environment (for example, the robot's visual feed, or the video game screen) and reward feedback (the carrot and stick). From this, it has to learn what brings higher rewards and estimate the best short-term or long-term policy for the agent accordingly. In other words, it has to estimate the series of actions that would maximize its...