Reinforcement learning is described by an agent getting inputs of the observation and reward from the previous time-step and producing output as an action with the goal of maximizing cumulative rewards.
The agent has a policy, value function, and model:
- The algorithm used by the agent to pick the next action is known as the policy. In the previous section, we wrote a policy that would take a set of parameters theta and would return the next action based on the multiplication between the observation and the parameters. The policy is represented by the following equation:
,
S is set of states and A is set of actions.
A policy is deterministic or stochastic.- A deterministic policy returns the same action for the same state in each run:
- A stochastic policy returns the different probabilities for the same action for the same state in each run:
- A deterministic policy returns the same action for the same state in each run:
- The...