Reinforcement learning algorithms concept
Let's create a simplistic model for reinforcement learning with an introduction of the basic terminologies:
At each step and time (t), the agent:
- Executes action at
- Receives observation ot
- Receives a reward rt
At each step and time (t), the environment:
- Receives action at
- Generates observation ot+1
- Generates scalar reward rt+1
The environment is considered to be non-deterministic (action at based on ot will receive reward rt and the same action in the same state may result in different rewards).
The agent (intelligent machine) is connected to the environmental context with its observation and action. The agent perceives the environment in a unique-to-itself manner and decides the action based on some of the popular and evolving techniques. At each step in time, the agent receives signals that represent the state of the environment.
The agent responds with an action that is one among several possible options at that point in time. The action generates an output...