Reinforcement learning 101
Reinforcement learning is described by an agent getting inputs of the observation and reward from the previous time-step and producing output as an action with the goal of maximizing cumulative rewards.
The agent has a policy, value function, and model:
- The algorithm used by the agent to pick the next action is known as the policy. In the previous section, we wrote a policy that would take a set of parameters theta and would return the next action based on the multiplication between the observation and the parameters. The policy is represented by the following equation: ,S is set of states and A is set of actions.
- A policy is deterministic or stochastic.
- A deterministic policy returns the same action for the same state in each run:
- A stochastic policy returns the different probabilities for the same action for the same state in each run:
- The value function predicts the amount of long-term reward based on the selected action in the current state. Thus, the value function...